Symbol dictionary compiling method and symbol dictionary retrieving method

ABSTRACT

If the character string is long, and when retrieving symbols containing characters of high frequency of appearance or character chain, high speed retrieval is possible up to infix matching, and a symbol dictionary of small capacity can be compiled. In the symbol dictionary compiling method of the invention, each symbol in symbol data is covered with shorter symbols called “meta-symbols” for covering the symbol in the symbol data, and the information showing how each symbol is covered is obtained by preparing meta-symbol appearance information recorded in each meta-symbol, and therefore high speed retrieval including up to infix matching is possible, and a symbol dictionary of small capacity can be compiled.

This application is a divisional of patent application Ser. No.09/451,047, filed Nov. 30, 1999.

FIELD OF THE INVENTION

The present invention relates to compilation and retrieval of a symboldictionary for use in a database device or a document retrieval devicefor controlling and retrieving accumulated electronic symbol informationby using a computer.

BACKGROUND OF THE INVENTION

With the wide spread use of word processors and personal computers,development of large capacity and low price memory media such as CD-ROM,and advancement of Ethernet networking, database systems such asrelational databases and full text retrieval databases have come to bewidely used.

Databases handle a relatively short character string of severalcharacters to hundreds of characters, such as a person's name, placename, organization name, address, classification code or part code as asymbol, storing a CSV list of symbols (a string of symbols connected bythe comma such as “MorishitaElectricIndustries,MorishitaCommunicationsIndustries, KyushuMorishitaElectric” as a fieldof trading partner company names) in one item (field) of the databaseand search for records which contain a complete match, a prefix match, apostfix match or an infix match to the query symbols and retrieve therecord at high speed (retrieving the condition, for example, in the caseof prefix matching, “retrieve the record containing a symbol startingwith “Morishita” in the trading partner company names field”).

Of these four methods of matching an efficient retrieval method forcomplete matching and prefix matching is realized by using the datastructure called TRIE (also known as radix searching tree) as mentionedin publications such as “Algorithm Vol. 2 (R. Seziwick, tr. by KoheiNoshita, et al., Kindai Kagakusha, 1992, ISBN 4-7649-0189-7, pp. 52-72)and “Algorithm and Data Structure Handbook (G. H. Gonnet, tr. MitsuoGen, et al., Keigaku Shuppan, 1987, ISBN 4-7665-0326-0, pp. 111-122). Inaddition, where postfix matching is needed, a TRIE may be constructedfor data reversed in the symbol character sequence, and it may beretrieved.

If infix matching is desired, efficient retrieval processing isdifficult by TRIE, and conventionally, for example, a method asdisclosed Japanese Laid-open Patent No. Hei 3-42774 has been employed.

In the method disclosed in Japanese Laid-open Patent No. Hei 3-42774,when compiling a symbol dictionary, a symbol character string is dividedcharacter by character and dictionary information recording a pair ofsymbol number and appearance character position of correspondingcharacter in symbol is created for every character, or when retrieving asymbol dictionary, a query character string is decomposed by character,dictionary information corresponding to each character is retrieved, anda set of symbol numbers identical in symbol numbers and consecutive inappearance character positions is issued a as retrieval result.

In this conventional compiling method of a symbol dictionary, however,when the types of symbols are more than tens of thousands, the symboldictionary file to be compiled is more than twice as large as the symboldata to be retrieved, and it is difficult to utilize if the usablecapacity of the memory device is limited.

Or in the conventional retrieving method of a symbol dictionary, if weretrieve a symbol which is long and contains many high-frequencycharacters, the quantity of intermediate data to be read out from thesymbol dictionary is tremendous, and the retrieval speed is reduced dueto such read operation and consecutive checking.

The disadvantage of a conventional retrieving method of a symboldictionary may be somewhat alleviated by recording the symbol dictionaryin every consecutive N characters or “N-gram” of plural characters,instead of the unit of creating and recording dictionary information forevery characters, but in the case of retrieving a symbol such as“199800000123A” initialed by the year and followed by multiple digits ofintegers mostly composed of consecutive zeros, there are many symbolsincidentally coinciding in the beginning 10 characters or more, and if Nis about 2 to 4 in N-gram, the amount of data to be read out from thesymbol dictionary is still large and the retrieval speed is reduced.

Further, by increasing the number N in the character chain, the types ofappearing N character chains increase abruptly and it is hard to compilea symbol dictionary and the capacity of the compiled symbol dictionaryincreases due to the housekeeping information. In the conventionalretrieval method of a symbol dictionary, when we retrieve a symbol whichis long and contains many high-frequency characters, complete matchingtakes the longest processing time among the four matching modes, and inthe application where complete matching occupies the majority ofqueries, the average retrieval speed is reduced.

Thus, in the conventional compiling method of a symbol dictionary, thesymbol dictionary file to be compiled is more than twice as large as thesymbol data to be retrieved, and it is difficult to utilize if theusable capacity of the memory device is limited.

Moreover, in the conventional retrieval method of a symbol dictionary,if we retrieve a symbol which is long and contains many high-frequencycharacters, the amount of data to be read out from the symbol dictionaryis tremendous, and the retrieval speed is reduced.

If the number of character chains N is increased, the types of appearingN character chains increase abruptly and it is hard to compile a symboldictionary with small housekeeping information, and the capacity of thecompiled symbol dictionary increases.

In a compiling method of a symbol dictionary of the invention, ameta-symbol dictionary gathering shorter symbols called “meta-symbols”for covering symbols in symbol data is compiled automatically, eachsymbol in the symbol data is covered with the meta-symbol in thismeta-symbol dictionary, the information how each symbol is covered canbe retrieved at high speed including up to infix matching by compilingthe meta-symbol appearance information recorded in every meta-symbol,and the size of the compiled symbol dictionary can be reduced; and in aretrieving method of a symbol dictionary of the invention, a querystring is covered with meta-symbols by retrieving the meta-symboldictionary contained in the compiled symbol dictionary file, retrievalresults of both right and left extension meta-symbols of the originalcovering meta-symbols are added to this covering result and high speedretrieval is possible for all matching modes including infix matching byseeking the symbol number set commonly contained in every element set inthe query string or covering results covering the right and leftextension character strings, and moreover in the application wherecomplete matching occupies the majority of queries, symbol retrieval ispossible without decreasing the average retrieval speed.

SUMMARY OF THE INVENTION

A compiling method of a symbol dictionary according to a first aspect ofthe invention comprises a symbol covering means for retrieving eachsymbol in symbol data by searching a meta-symbol dictionary and findingthe covering result by extraction method such as maximal word extractionmethod, meta-symbol accumulating means for accumulating coveringresults, a meta-symbol frequency table for accumulating the totalappearance frequency of each meta-symbol in the symbol data, meta-symboldictionary update judging means for adding or deleting of meta-symbolsin the meta-symbol dictionary, and for deciding to stop the meta-symbolaccumulation, by referring to the meta-symbol frequency table andconforming to the predetermined condition/parameters, meta-symbolappearance information compiling means for calculating the meta-symbolappearance information recording the number of the symbol containingeach meta-symbol and the appearance character position from therecovering results, and symbol dictionary compiling means for compilinga machine-retrievable symbol dictionary from the meta-symbol dictionaryand meta-symbol appearance information, if we retrieve a symbol which islong and contains many high-frequency characters, high speed retrievalis possible up to infix matching, and the size of compiled symboldictionary can be reduced.

A retrieval method of a symbol dictionary for complete matchingaccording to a second aspect of the invention comprises a query stringcovering means for retrieving a meta-symbol dictionary in a symboldictionary, and finding the covering result from the query string by theduplicate longest match word extraction method, meta-symbol appearanceinformation retrieval means for retrieving meta-symbol appearanceinformation in the symbol dictionary, and finding a set of symbolnumbers containing the desired meta-symbol at the corresponding positionfrom each element in the covering result obtained at a first step, and asymbol number assessing means for finding a common portion ofcorresponding symbol number sets in all elements in the covering result,and, if the found common portion is not empty, issuing the symbol numberof the element as retrieval result and terminating the retrievalprocess, or if the set is empty, terminating the retrieval processingassuming there is no retrieval result, in which if the character stringis long and when retrieving a symbol containing characters or characterchain of high frequency, high speed symbol dictionary retrieval ispossible by complete matching.

A retrieving method of a symbol dictionary of prefix matching accordingto a third aspect of the invention comprises question character stringcovering means for retrieving a meta-symbol dictionary in a symboldictionary and finding the covering from the question character string Qin the retrieval condition by the longest matchoverlapped longest matchword extraction method, and, if there is no covering, terminating theretrieval processing assuming there is no retrieval result, or, if thereis covering, recording the covering result, right extended meta-symbolassessing means for retrieving meta-symbol information in the symboldictionary, retrieving, in the covering result, all meta-symbols x ofright extended meta-symbols (that is, meta-symbols containing characterstring R in the beginning portion) of j-th rightmost portion characterstring of the question character string (that is, the partial characterstring from the j-th character (1≦j≦|Q|) to the final character in thequestion character string) R, out of extended meta-symbols ofmeta-symbol Z of covering elements of which collating start characterposition is 1 (that is, meta-symbols containing Z), and adding elements(x, j, |R|+j) to the covering result and recording, and symbol numberset assessing means for retrieving meta-symbol appearance information inthe symbol dictionary while systematically compiling a set C of elementsin the covering result covering the question character string or anarbitrary right extended character string, collecting a symbol numberset SC commonly contained in all elements of C, recording as part ofretrieval result, and issuing the sum set of all SCs as final retrievalresult, in which if the character string is long and when retrieving asymbol containing characters or character chain of high frequency, highspeed symbol dictionary retrieval is possible by prefix matching.

A retrieving method of a symbol dictionary of postfix matching accordingto a fourth aspect of the invention comprises question character stringcovering means for retrieving a meta-symbol dictionary in a symboldictionary, and finding the covering from the question character stringQ in the retrieval condition by the longest matchoverlapped longestmatch word extraction method, and, if there is no covering, terminatingthe retrieval processing assuming there is no retrieval result, or, ifthere is covering, recording the covering result, left extendedmeta-symbol assessing means for retrieving meta-symbol information inthe symbol dictionary, retrieving, in the covering result, allmeta-symbols x of left extended meta-symbols (that is, meta-symbolscontaining character string L in the end portion) of j-th leftmostportion character string of the question character string (that is, thepartial character string from the first character to the j-th character(1≦j≦|Q|) in the question character string) L, out of extendedmeta-symbols of meta-symbol Z of covering elements of which collatingend character position is |Q|+1 (that is, meta-symbols containing Z),and adding elements (x, j+1−|L|, j+1) to the covering result andrecording, and symbol number set assessing means for retrievingmeta-symbol appearance information in the symbol dictionary whilesystematically compiling a set C of elements in the covering resultcovering the question character string or an arbitrary left extendedcharacter string, collecting a symbol number set SC commonly containedin all elements of C, recording as part of retrieval result, and issuingthe sum set of all SCs as final retrieval result, in which if thecharacter string is long and when retrieving a symbol containingcharacters or character chain of high frequency, high speed symboldictionary retrieval is possible by postfix matching.

A retrieving method of a symbol dictionary of infix matching accordingto a fifth aspect of the invention comprises question character stringcovering means for retrieving a meta-symbol dictionary in a symboldictionary, and finding the covering from the question character stringQ in the retrieval condition by the longest matchoverlapped longestmatch word extraction method, and, if there is no covering, terminatingthe retrieval processing assuming there is no retrieval result, or, ifthere is covering, recording the covering result, right extendedmeta-symbol assessing means for retrieving meta-symbol information inthe symbol dictionary, retrieving, in the covering result, allmeta-symbols x of right extended meta-symbols (that is, meta-symbolscontaining character string R in the beginning portion) of j-thrightmost portion character string of the question character string(that is, the partial character string from the j-th character (1≦j≦|Q|)to the final character in the question character string) R, out ofextended meta-symbols of meta-symbol Z of covering elements of whichcollating start character position is 1 (that is, meta-symbolscontaining Z), and adding elements (x, j, |R|+j) to the covering resultand recording, left extended meta-symbol assessing means for retrievingmeta-symbol information in the symbol dictionary, retrieving, in thecovering result, all meta-symbols x of left extended meta-symbols (thatis, meta-symbols containing character string L in the end portion) ofj-th leftmost portion character string of the question character string(that is, the partial character string from the first character to thej-th character (1≦j≦|Q|) in the question character string) L, out ofextended meta-symbols of meta-symbol Z of covering elements of whichcollating end character position is |Q|+1 (that is, meta-symbolscontaining Z), and adding elements (x, j+1−|L|, j+1) to the coveringresult and recording, both extended meta-symbol assessing means forretrieving the meta-symbol dictionary, retrieving all of both extendedmeta-symbols x of Q, adding elements (x, 1−j, 1−j+|x|) to the coveringresult and recording, and symbol number set assessing means forretrieving meta-symbol onset information in the symbol dictionary whilesystematically compiling a set C of elements in the covering resultcovering the question character string or an arbitrary extendedcharacter string, collecting a symbol number set SC commonly containedin all elements of C, recording as part of retrieval result, and issuingthe sum set of all SCs as final retrieval result, in which if thecharacter string is long and when retrieving a symbol containingcharacters or character chain of high frequency, high speed symboldictionary retrieval is possible by infix matching.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram showing a general constitution of a symboldictionary compiling apparatus in a first embodiment of the invention.

FIG. 2 is a block diagram showing a general constitution of a symboldictionary retrieving apparatus in a second embodiment of the invention.

FIG. 3 is a block diagram showing a general constitution of a symboldictionary retrieving apparatus in a third embodiment of the invention.

FIG. 4 is a block diagram showing a general constitution of a symboldictionary retrieving apparatus in a fourth embodiment of the invention.

FIG. 5 is a block diagram showing a general constitution of a symboldictionary retrieving apparatus in a fifth embodiment of the invention.

FIG. 6 is a flowchart describing the procedure of covering process bysymbol covering means in the first embodiment.

FIG. 7 is a flowchart describing the procedure of covering process byquestion character string covering means in the second to fifthembodiments.

FIG. 8 is a flowchart describing the procedure of symbol numberassessing process by symbol number assessing means in the secondembodiment.

FIG. 9 is a flowchart describing the procedure of symbol number setassessing process by symbol number set assessing means in the thirdembodiment.

FIG. 10 is a flowchart describing the procedure of symbol number setassessing process by symbol number set assessing means in the thirdembodiment.

FIG. 11 is a flowchart describing the procedure of symbol number setassessing process by symbol number set assessing means in the thirdembodiment.

FIG. 12 is a flowchart describing the procedure of symbol number setassessing process by symbol number set assessing means in the thirdembodiment.

FIG. 11 is a flowchart describing the procedure of symbol number setassessing process by symbol number set assessing means in the fourthembodiment.

FIG. 12 is a flowchart describing the procedure of symbol number setassessing process by symbol number set assessing means in the fourthembodiment.

FIG. 13 is a flowchart describing the procedure of symbol number setassessing process by symbol number set assessing means in the fifthembodiment.

FIG. 14 is a flowchart describing the procedure of symbol number setassessing process by symbol number set assessing means in the fifthembodiment.

FIG. 15 is an example of symbol data in the first embodiment.

FIG. 16 is an example of registered content of meta-symbol dictionary ininitial stage in the first embodiment.

FIG. 17 is an example of content of meta-symbol frequency table in thefirst embodiment.

FIG. 18 is an example of content of meta-symbol frequency table in thefirst embodiment.

FIG. 19 is an example of meta-symbol dictionary in the first embodiment.

FIG. 20 is an example of part of meta-symbol dictionary in the firstembodiment.

FIG. 21 is an example of content of meta-symbol frequency table in thefirst embodiment.

FIG. 22 is an example of content of meta-symbol frequency table in thefirst embodiment.

FIG. 23 is an example of content of meta-symbol frequency table in thefirst embodiment.

FIG. 24 is an example of content of meta-symbol frequency table in thefirst embodiment.

FIG. 25 is an example of content of meta-symbol frequency table in thefirst embodiment.

FIG. 26 is an example of content of meta-symbol frequency table in thefirst embodiment.

FIG. 27 is an example of content of meta-symbol frequency table in thefirst embodiment.

FIG. 28 is an example of content of meta-symbol frequency table in thefirst embodiment.

FIG. 29 is an example of content of meta-symbol onset information in thefirst embodiment.

FIG. 30 is a conceptual diagram showing an example of data structure ofmeta-symbol dictionary in the first embodiment.

FIG. 31 is a conceptual diagram showing an example of data structure ofmeta-symbol dictionary in the first embodiment.

FIG. 32 is a conceptual diagram showing an example of data structure ofmeta-symbol dictionary in the first embodiment.

FIG. 33 is an example of content of extended information of meta-symbolin the first embodiment.

FIG. 34 is a conceptual diagram describing principal intermediate datain a process of symbol dictionary retrieval in the second embodiment.

FIG. 35 is a conceptual diagram describing principal intermediate datain a process of symbol dictionary retrieval in the third embodiment.

FIG. 36 is a conceptual diagram describing principal intermediate datain a process of symbol dictionary retrieval in the third embodiment.

FIG. 37 is a conceptual diagram describing principal intermediate datain a process of symbol dictionary retrieval in the fourth embodiment.

FIG. 38 is a conceptual diagram describing principal intermediate datain a process of symbol dictionary retrieval in the fifth embodiment.

FIG. 39 is a conceptual diagram describing principal intermediate datain a process of symbol dictionary retrieval in the fifth embodiment.

REFERENCE NUMERALS

-   -   101 Symbol data    -   102 Meta-symbol dictionary    -   103 Symbol covering means    -   104 Meta-symbol summing means    -   105 Meta-symbol frequency table    -   106 Meta-symbol dictionary update judging means    -   107 Meta-symbol appearance information compiling means    -   108 Meta-symbol appearance information    -   109 Symbol dictionary compiling means    -   110 Symbol dictionary    -   201 Symbol dictionary    -   202 Retrieval condition input means    -   203 Question character string covering means    -   204 Covering result    -   205 Symbol number assessing means    -   206 Retrieval result output means    -   301 Symbol dictionary    -   302 Retrieval condition input means    -   303 Question character string covering means    -   304 Covering result    -   305 Symbol number set assessing means    -   306 Retrieval result output means    -   307 Right extended meta-symbol assessing means    -   401 Symbol dictionary    -   402 Retrieval condition input means    -   403 Question character string covering means    -   404 Covering result    -   405 Symbol number set assessing means    -   406 Retrieval result output means    -   408 Left extended meta-symbol assessing means    -   501 Symbol dictionary    -   502 Retrieval condition input means    -   503 Question character string covering means    -   504 Covering result    -   505 Symbol number set assessing means    -   506 Retrieval result output means    -   507 Right extended meta-symbol assessing means    -   508 Left extended meta-symbol assessing means    -   509 Both extended meta-symbol assessing means

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

An exemplary embodiment of the present invention relates to a compilingmethod of a symbol dictionary, being a method of compiling amachine-retrievable symbol dictionary for symbol data, registering afinite number of symbols mutually different out of an array of not morethan N (a specific number of) characters contained in a certaindetermined character set, comprising a first step of symbol dictionarycompilation in which symbol covering means retrieves each symbol in thesymbol data in a prepared meta-symbol dictionary in initial state, andsearches covering (that is, relating to a pair of a meta-symbol forcollating with a partial character string in the symbols to be coveredand collating start character position, a set containing any characterin the symbols to be covered in at least one of pair of meta-symbols),meta-symbol summing means sums up covering results, total appearancefrequency of each meta-symbol in the symbol data is accumulated in ameta-symbol frequency table, meta-symbol dictionary update judging meansrefers to the meta-symbol frequency table, and after deleting themeta-symbol from the meta-symbol dictionary according to a predeterminedstandard, adds the meta-symbol to the meta-symbol dictionary accordingto a predetermined standard, a second step of symbol dictionarycompilation in which the symbol covering means retrieves each symbol inthe symbol data in the meta-symbol dictionary at the present to searchthe covering, the meta-symbol summing means sums up the coveringresults, the total appearance frequency in the symbol data in eachmeta-symbol is accumulated in the meta-symbol frequency table, themeta-symbol dictionary update judging means refers to the meta-symbolfrequency table, adds the meta-symbol to the meta-symbol dictionaryaccording to a predetermined standard, judges if the predeterminedstopping condition is satisfied or not, and repeats the second stepuntil satisfying the stopping condition, a third step of symboldictionary compilation in which the meta-symbol dictionary updatejudging means refers to the meta-symbol frequency table, and deletes themeta-symbol from the meta-symbol dictionary according to a predeterminedstandard, a fourth step of symbol dictionary compilation in which thesymbol covering means covers the symbol data by using the meta-symboldictionary calculated at the third step, and the meta-symbol appearanceinformation compiling means calculates the meta-symbol appearanceinformation recording the symbol number for showing each meta-symbol andthe appearance character position from the covering result, and a fifthstep of symbol dictionary compilation in which the symbol dictionarycompiling means compiles a machine-retrievable symbol dictionary storingmeta-symbol information and meta-symbol appearance information from themeta-symbol dictionary and meta-symbol appearance information, andtherefore if the character string is long and when retrieving a symbolcontaining characters or character chain of high frequency, high speedsymbol dictionary retrieval is possible including up to infix matching,and a symbol dictionary of small capacity can be compiled.

The invention further relates to the compiling method of symboldictionary in which covering of symbol is determined by maximal wordextraction method in the symbol covering means.

The invention further relates to the compiling method of symboldictionary in which a symbol composed of one character only, about eachcharacter in a predetermined character set, and zero or more characterstring known as part of the symbol in the symbol data are registered inthe prepared meta-symbol dictionary in initial state.

The invention further relates to the compiling method of symboldictionary in which the deletion of a meta-symbol in the first step isdone on the basis of deleting the meta-symbol of which frequency in themeta-symbol frequency table is 0, and the addition of a meta-symbol inthe first step is done on the basis of adding the meta-symbol by addingone arbitrary character in the meta-symbol dictionary at the end, as fora meta-symbol less than N characters in the meta-symbol frequency tableof which frequency is frequency C1 or more determined by the symbol datacontent.

The invention further relates to the compiling method of symboldictionary in which the addition of a meta-symbol in the second step isdone on the basis of adding the meta-symbol by adding one arbitrarycharacter in the predetermined character set at the end, as for ameta-symbol less than N characters in the meta-symbol frequency table ofwhich frequency is frequency Ck or more determined by the symbol datacontent and the number of times of repetition of the second step, andthe deletion of a meta-symbol in the third step is done on the basis ofdeleting a meta-symbol of which frequency in the meta-symbol frequencytable is less than E and two characters or more.

The invention further relates to the compiling method of a symboldictionary in which the stopping condition in the second step is thecondition of stopping when there is no addition or deletion of ameta-symbol in the meta-symbol dictionary update judging means.

The invention further relates to the compiling method of symboldictionary, in which the sequence number of the corresponding symbol inthe symbol data is used as the symbol number in the third step.

The invention further relates to a retrieving method of a symboldictionary, being a method of retrieving complete coincidence (that is,retrieving a same symbol as question character string) of an arbitrarycharacter string using a symbol dictionary storing meta-symbolinformation and meta-symbol appearance information, comprising a firststep of symbol dictionary retrieval in which question character stringcovering means retrieves meta-symbol information in the symboldictionary, and searches covering in the question character string Q ofretrieval condition by longest matchoverlapped longest match wordextraction method (that is, covering elements of pair (m, s, e) ofmeta-symbol m collating with partial character string, collatingcharacter start position s, and collating end character position e(1≦s<e≦|Q|+1) in the character string to be covered, and a setcontaining any character of Q in at least one covering element), ifthere is no covering (that is, covering elements of pair (m, s, e) ofmeta-symbol m collating with partial character string, collatingcharacter start position s, and collating end character position e(1≦s<e≦|Q|+1) in the character string to be covered, and there is no setcontaining any character of Q in at least one covering element), theretrieval process is terminated as being no retrieval result, and ifthere is covering, the covering result is stored in the working range,and a second step of symbol dictionary retrieval in which symbol numberset assessing means retrieves the meta-symbol onset information in thesymbol dictionary, and if there is (only one) symbol number containedcommonly in all elements in the covering result, it is issued as theretrieval result and the retrieval process is terminated, and if thereis no symbol number contained commonly in all elements in the coveringresult, the retrieval process is terminated as being no retrievalresult, and therefore if the character string is long and whenretrieving a symbol containing characters of high frequency or characterchain, high speed symbol dictionary retrieval is possible by completematching.

The invention further relates to a retrieving method of a symboldictionary, being a method of retrieving forward coincidence (that is,retrieving all symbols having a question character string in thebeginning portion) by an arbitrary character string using a symboldictionary storing meta-symbol information and meta-symbol appearanceinformation, comprising a first step of symbol dictionary retrieval inwhich a question character string covering means retrieves meta-symbolinformation in the symbol dictionary, and searches covering in thequestion character string Q of retrieval condition by longestmatchoverlapped longest match word extraction method (that is, coveringelements of pair (m, s, e) of meta-symbol m collating with partialcharacter string, collating character start position s, and collatingend character position e (1≦s<e≦|Q|+1) in the character string to becovered, and a set containing any character of Q in at least onecovering element), if there is no covering (that is, covering elementsof pair (m, s, e) of meta-symbol m collating with partial characterstring, collating character start position s, and collating endcharacter position e (1≦s<e≦|Q|+I) in the character string to becovered, and there is no set containing any character of Q in at leastone covering element), the retrieval process is terminated as being noretrieval result, and if there is covering, the covering result isrecorded, a second step of symbol dictionary retrieval in which rightextended meta-symbol assessing means retrieves meta-symbol informationin the symbol dictionary, retrieves, in the covering result, allmeta-symbols x of right extended meta-symbols (that is, meta-symbolscontaining character string R in the beginning portion) of j-thrightmost portion character string of the question character string(that is, the partial character string from the j-th character (1≦j≦|Q|)to the final character in the question character string) R, out ofextended meta-symbols of meta-symbol Z of covering elements of whichcollating start character position is 1 (that is, meta-symbolscontaining Z), and adds elements (x, j, |R|+j) to the covering resultand records, and a third step of symbol dictionary retrieval in whichsymbol number set assessing means retrieves meta-symbol appearanceinformation in the symbol dictionary while systematically compiling aset C of elements in the covering result covering the question characterstring or an arbitrary right extended character string, collects asymbol number set SC commonly contained in all elements of C, records aspart of retrieval result, and issues the sum set of all SCs as finalretrieval result, and therefore if the character string is long and whenretrieving a symbol containing characters or character chain of highfrequency, high speed symbol dictionary retrieval is possible by prefixmatching.

The invention further relates to a retrieving method of a symboldictionary, being a method of retrieving backward coincidence (that is,retrieving all symbols having a question character string in the endportion) by an arbitrary character string using a symbol dictionarystoring meta-symbol information and meta-symbol appearance information,comprising a first step of symbol dictionary retrieval in which questioncharacter string covering means retrieves meta-symbol information in thesymbol dictionary, and searches covering in the question characterstring Q of retrieval condition by longest matchoverlapped longest matchword extraction method (that is, covering elements of pair (m, s, e) ofmeta-symbol m collating with partial character string, collatingcharacter start position s, and collating end character position e(1≦s<e≦|Q|+1) in the character string to be covered, and a setcontaining any character of Q in at least one covering element), ifthere is no covering (that is, covering elements of pair (m, s, e) ofmeta-symbol m collating with partial character string, collatingcharacter start position s, and collating end character position e(I≦s<e≦|Q|+1) in the character string to be covered, and there is no setcontaining any character of Q in at least one covering element), theretrieval process is terminated as being no retrieval result, and ifthere is covering, the covering result is recorded, a second step ofsymbol dictionary retrieval in which left extended meta-symbol assessingmeans retrieves meta-symbol information in the symbol dictionary,retrieves, in the covering result, all meta-symbols x of left extendedmeta-symbols (that is, meta-symbols containing character string L in theend portion) of j-th leftmost portion character string of the questioncharacter string (that is, the partial character string from the firstcharacter to the j-th character (1≦j≦|Q|) in the question characterstring) L, out of extended meta-symbols of meta-symbol Z of coveringelements of which collating end character position is |Q|+1 (that is,meta-symbols containing Z), and adds elements (x, j+1−|L|, j+1) to thecovering result and records, and a third step of symbol dictionaryretrieval in which symbol number set assessing means retrievesmeta-symbol appearance information in the symbol dictionary whilesystematically compiling a set C of elements in the covering resultcovering the question character string or an arbitrary left extendedcharacter string, collects a symbol number set SC commonly contained inall elements of C, records as part of retrieval result, and issues thesum set of all SCs as final retrieval result, and therefore if thecharacter string is long and when retrieving a symbol containingcharacters or character chain of high frequency, high speed symboldictionary retrieval is possible by postfix matching.

The invention further relates to a retrieving method of a symboldictionary, being a method of retrieving intermediate coincidence (thatis, retrieving all symbols having a question character string) by anarbitrary character string using a symbol dictionary storing meta-symbolinformation and meta-symbol appearance information, comprising a firststep of symbol dictionary retrieval in which question character stringcovering means retrieves meta-symbol information in the symboldictionary, and searches covering in the question character string Q ofretrieval condition by longest matchoverlapped longest match wordextraction method (that is, covering elements of pair (m, s, e) ofmeta-symbol m collating with partial character string, collatingcharacter start position s, and collating end character position e(1≦s<e≦|Q|+1) in the character string to be covered, and a setcontaining any character of Q in at least one covering element), ifthere is no covering (that is, covering elements of pair (m, s, e) ofmeta-symbol m collating with partial character string, collatingcharacter start position s, and collating end character position e(1≦s<e≦|Q|+1) in the character string to be covered, and there is no setcontaining any character of Q in at least one covering element), theretrieval process is terminated as being no retrieval result, and ifthere is covering, the covering result is recorded, a second step ofsymbol dictionary retrieval in which right extended meta-symbolassessing means retrieves meta-symbol information in the symboldictionary, retrieves, in the covering result, all meta-symbols x ofright extended meta-symbols (that is, meta-symbols containing characterstring R in the beginning portion) of j-th rightmost portion characterstring of the question character string (that is, the partial characterstring from the j-th character (1≦j≦|Q|) to the final character in thequestion character string) R, out of extended meta-symbols ofmeta-symbol Z of covering elements of which collating start characterposition is 1 (that is, meta-symbols containing Z), and adds elements(x, j, |R|+j) to the covering result and records, a third step of symboldictionary retrieval in which left extended meta-symbol assessing meansretrieves meta-symbol information in the symbol dictionary, retrieves,in the covering result, all meta-symbols x of left extended meta-symbols(that is, meta-symbols containing character string L in the end portion)of j-th leftmost portion character string of the question characterstring (that is, the partial character string from the first characterto the j-th character (1≦j≦|Q|) in the question character string) L, outof extended meta-symbols of meta-symbol Z of covering elements of whichcollating end character position is |Q|+1 (that is, meta-symbolscontaining Z), and adds elements (x, j+1−|L|, j+1) to the coveringresult and records, a fourth step of symbol dictionary retrieval inwhich both extended meta-symbol assessing means retrieves themeta-symbol information, retrieves all of both extended meta-symbols ofQ (that is, meta-symbols containing character string Q in the portionfrom the j-th character to the j+|Q|-th character, where 1<j) X, addselements (X, 1−j, 1−j+|X|) to the covering result and records, and afifth step of symbol dictionary retrieval in which symbol number setassessing means retrieves meta-symbol appearance information in thesymbol dictionary while systematically compiling a set C of elements inthe covering result covering the question character string or anarbitrary extended character string, collects a symbol number set SCcommonly contained in all elements of C, records as part of retrievalresult, and issues the sum set of all SCs as final retrieval result, andtherefore if the character string is long and when retrieving a symbolcontaining characters or character chain of high frequency, high speedsymbol dictionary retrieval is possible by infix matching.

(Embodiment 1)

A first embodiment of the invention is described below while referringto the drawings. FIG. 1 is a block diagram showing a generalconstitution of an embodiment of a symbol dictionary compilingapparatus. In FIG. 1, reference numeral 101 is symbol data as the objectof compilation of dictionary, 102 is a meta-symbol dictionary, 103 issymbol covering means for retrieving the meta-symbol dictionary 102 andseeking the retrieval result of each symbol in the symbol data bymaximal word extraction method, 104 is meta-symbol summing means forreceiving the covering result issued by the symbol covering means 103and summing up meta-symbols extracted as covering elements, 105 is ameta-symbol frequency table for storing the summing result of themeta-symbol summing means 104, 106 is meta-symbol dictionary updatejudging means for judging addition of meta-symbol to the meta-symboldictionary 102, deletion of meta-symbol from the meta-symbol dictionary102, and stopping condition of meta-symbol dictionary update process onthe basis of the summing result of the meta-symbol frequency table 105,107 is meta-symbol appearance information compiling means for receivingthe covering result issued from the symbol covering means 103, andcompiling a meta-symbol onset information recording the symbol number ofthe meta-symbol extracted as covering element and extracted characterposition in every meta-symbol, 108 is meta-symbol appearance informationcompiled by the meta-symbol appearance information compiling means 107,109 is symbol dictionary compiling means for compiling a retrievablesymbol dictionary from the meta-symbol dictionary 102 and meta-symbolappearance information 108, and 110 is a retrievable symbol dictionarycompiled by the symbol dictionary compiling means 109. FIG. 6 is aflowchart showing the procedure of the process of finding the coveringresult by the maximal word extraction method, by using the meta-symboldictionary 102, from each symbol to be covered in the symbol data 101 inthe symbol covering means 103.

In thus constituted symbol dictionary compiling apparatus, its operationis explained below by referring to an example of compiling a symboldictionary that can be retrieved from symbol data linking the date,15-minute time increments, and surnames in Roman alphabet. FIG. 15 showsan example of symbol data. In this diagram, certain symbols are omitted,but actually a total of 1000 different symbols are stored as symbol datain the sequence of date and time. The maximum number of characters ofeach symbol is 1000 characters. At a first step, the additionalthreshold C1 of meta-symbol is {fraction (1/20)} of the number ofsymbols. In this example, since the number of symbols is 1000, C1=50. Ata second step, the additional threshold Ck of meta-symbol in k-threpetition is {fraction (1/10)} of the number of symbols as far as k<10,and ⅕ of the number of symbols if 10≦k. In this example, Ck=100 if k<10,and Ck=200 if 10≦k. At a third step, the value of threshold E isdetermined as 5. Herein, in the symbol data, only two symbols “-” and“/”, numerals “0” to “9”, and letters “A” to “Z” of alphabet are used,and other characters are not used. As the meta-symbol dictionary forcovering such symbols, prior to compilation of symbol dictionary, ameta-symbol dictionary consisting of one character only possible toappear in the symbol is prepared as shown in FIG. 16. In order to coverefficiently in the symbol covering means, the meta-symbol dictionary hasa digital tree data structure (that is, TRIE) as shown in FIG. 30. Themeta-symbol of the longest match with a certain character string can beretrieved efficiently by using TRIE, that is, from the root of TRIE (theleft end bullet), tracing the tree structure by referring to the firstcharacter, second character and so forth of the character string as theclues, and issuing a partial character string up to the double circlenode remotest from the root. In the first step of compiling the symboldictionary, the symbol covering means 103 reads the symbol data in FIG.15 stored in the symbol data 101 sequentially, retrieves the meta-symboldictionary of the content as shown in FIG. 30 stored in the meta-symboldictionary 102, and finds the covering result by the maximal wordextraction method. The maximal word extraction method is a method ofword extraction of taking out only the collation of collating characterintervals not contained completely in any other collating characterstring, out of the meta-symbols in the meta-symbol dictionary forcollating character string with various partial character strings with acertain symbol S. For example, supposing a symbol “TOKYO METROPOLITANCOUNCIL,” when the maximal word extraction is performed by retrievingthe meta-symbol dictionary containing six meta-symbols (TO, TOKYO, TOKYOMETRO, TOKYO METROPOLITAN, METROPOLITAN COUNCIL, COUNCIL), by coveringprocess as shown in the flowchart in FIG. 6, the covering result isobtained as follows.

-   -   {(TOKYO METROPOLITAN, 1, 19), (METROPOLITAN COUNCIL, 17, 27)}

In (X, s, e), X denotes the meta-symbol, s shows the collating startcharacter position, and e is the collating end character position (thatis, the character position of the character immediately at the rightside of the collated portion), and this set of three pieces is calledthe covering element. Meta-symbols, such as “TO”, “TOKYO”, and “TOKYOMETRO” are to be collated with the partial character string of “TOKYOMETROPOLITAN COUNCIL”, but although they are contained in themeta-symbol dictionary, since they are completely included in thecollating portion of “TOKYO METROPOLITAN” (in the first two words), theyare not included in the covering result. The process of “removing thecollation completely contained in other collating portion” is judged atstep 603 in FIG. 6, and for the meta-symbol M to be collated from thei-th character to the (i+j−1)-th character of symbol S, of the alreadycollated results from the first to the i-1-th character, if thecollating end position (=character position at right side of collatingportion) e is i+j or more, the collating character interval [s, e-1] ofthe collation (X, s, e) includes the character interval [i, i+j−1]completely, and in this case the judgement at step 603 is [yes], and thecollated result is not added to the covering result set C.

Incidentally, since all meta-symbols in the meta-symbol dictionary inFIG. 16 are composed of one character only, in the covering process ofsymbol covering means at the first step, the extracted meta-symbol isalways one character, and the judgement at step 603 in FIG. 6 is not[yes], and, for example, the symbol

-   -   1998-JAN-01/AM0200/KAWAYASU    -   is covered as follows    -   {(1, 1, 2), (9, 2, 3), . . . , (S, 26, 27), (U, 27 28)}    -   in one character each. Since all characters possibly appearing        in the symbol are included in the meta-symbol dictionary, a        non-empty covering result is always obtained. The retrieval        result of each symbol is transferred to the meta-symbol summing        means 104, and the number of times of each meta-symbol extracted        as covering element is recorded in the meta-symbol frequency        table 105. For example, after processing of the first symbol        “1998-JAN-01/AM0200/KAWAYASU” in the symbol data, the content of        the meta-symbol frequency table contains “-” twice, “/” twice,        “0” four times, “1” twice, “2” once, “8” once, “9” twice, “A”        five times, “J” once, “K” once, “M” once, “N” once, “S” once,        “U” once, “W” once, and “Y” once. The frequency of other        characters is zero. After processing 1000 symbols in the symbol        data 101, the content of the meta-symbol frequency table is as        shown in FIG. 17. In this example, since all symbols are in the        format of “yyyy-mmm-dd/XXhhmm/name,” the frequency of “-” and        “/” is 2000 times (twice×1000 symbols in every symbol). It is        also clear that meta-symbols “H”, “I”, “Q”, “X”, “Z” do not        appear at all in the symbol data in FIG. 15. At this moment, of        the meta-symbol frequency table, five meta-symbols of which        frequency is zero is deleted from the meta-symbol dictionary.        FIG. 18 is a frequency table concerning each meta-symbol after        deletion. The threshold C1 is 50, and all meta-symbols in FIG.        18 are in the number of characters of 1 and the maximum number        of characters is less than 1000, and therefore the meta-symbol        dictionary update judging means 106 adds two-character        meta-symbols such as “-”, “-/”, and “-0” having each meta-symbol        in FIG. 18 added to the end concerning all meta-symbols in FIG.        18, and updates to the meta-symbol dictionary including the        meta-symbols as shown in FIG. 19. The actual structure of the        meta-symbol dictionary is built and held as a digital tree        structure (that is, TRIE) as shown in FIG. 31. This ends the        first step of compiling the symbol dictionary.

In the second step of compiling the symbol dictionary, covering of eachsymbol at the first step and summing of frequency of extractedmeta-symbols are executed again by using the meta-symbol dictionary inFIG. 19. For example, the symbol

-   -   1998-JAN-01/AM0200/KAWAYASU    -   is covered as follows    -   {(19, 1, 3), (99, 2, 4), . . . , (AS, 25, 27), (SU, 26, 28)}    -   in two characters each. Since all characters possibly appearing        in the symbol are included in the meta-symbol dictionary, a        non-empty covering result is always obtained. In the covering        process at the second step, meanwhile, since meta-symbols        different in the number of characters are mixed in the        meta-symbol dictionary 102, unlike the first step, the judgement        at step 603 may be possibly [yes] in the flowchart in FIG. 6,        and all meta-symbols extracted in the longest match may not be        always contained in the covering result (in the above example,        since there is (SU, 26, 28), the end (U, 27, 28) is not included        in the covering result). Of the meta-symbol frequency table        after processing 1000 symbols in the symbol data 101, the        portion of meta-symbols of which frequency is not zero is as        shown in FIG. 20. Comparing it with FIG. 18, for example, the        meta-symbol “-” which is 2000 times of frequency in FIG. 18 is        known to be dispersed in FIG. 20, that is, the frequency is        dispersed into a total of 23 types of meta-symbols, consisting        of 12 types of two-character meta-symbols starting with such as        “-0” and “-S”, and 11 types of meta-symbols ending with “-” such        as “8-”, “B-”, “C-”, “G-”, “L-”, “N-”, “P-”, “R-”, “T-”, “V-”,        “Y-”. The total of frequency of 12 types of two-character        meta-symbols starting with “-” and the total of frequency of 11        types of meta-symbols ending with “-” are both 2000, and it is        known that the character “-” in the symbols is contained in tow        meta-symbols sharing this character “-” of the meta-symbol        starting with “-” and the meta-symbol ending with “-”. Since the        threshold C2 is 100, and all meta-symbols in FIG. 20 are 2 in        the number of characters and the maximum number of characters is        less than 1000, the meta-symbol dictionary update judging means        106 updates the meta-symbol dictionary by adding three-character        meta-symbols such as “-0-”, “-0/”, “-00”, adding each        meta-symbol in FIG. 18 (that is, one-character meta-symbol) to        the end, concerning 62 types of meta-symbols of which frequency        is 100 or more, such as “-0”, “-1”, “-2”, “-A” in FIG. 20. At        the same time, the meta-symbol dictionary update judging means        106 does not terminate the second step because addition of        meta-symbol has occurred once or more as shown above, but judges        to continue similar covering, summing, and updating process by        using the updated meta-symbol dictionary successively. Of the        meta-symbol frequency table after similarly covering and summing        by using the updated meta-symbol dictionary 102, the portion of        meta-symbols of which frequency is 1 or more is as shown in        FIG. 21. For example, the symbol    -   1998-JAN-01/AM0200/KAWAYASU    -   is covered as follows    -   {(199, 1, 4), (998, 2, 5), (98-, 3, 6), (8-J, 4, 7), (-JA, 5,        8), (AN-, 7, 10), (N-0, 8, 11), . . . , (YAS, 24, 27), (ASU, 25,        28)}    -   and it is known that the longest match meta-symbol “JA” from the        character “J” is not included in the covering result. In FIG.        21, two-character meta-symbols and three-character meta-symbols        are coexisting, and when covering, it is known that the symbols        are covered by using the three-character meta-symbol in the        portion large in the frequency of appearance, and by using        two-character meta-symbol in the portion relatively small in        frequency of appearance. Since the threshold C3 is 100, and all        meta-symbols in FIG. 21 are 3 or less in the number of        characters and the maximum number of characters is less than        1000, the meta-symbol dictionary update judging means 106        updates the meta-symbol dictionary by adding meta-symbols such        as “-01-”, “-01/”, “-010”, adding each meta-symbol in FIG. 18        (that is, one-character meta-symbol) to the end, concerning 42        types of meta-symbols of which frequency is 100 or more, such as        “-01”, “-02”, “-03”, “-04” in FIG. 21. At the same time, the        meta-symbol dictionary update judging means 106 does not        terminate the second step because addition of meta-symbol has        occurred once or more as shown above, but judges to continue        similar covering, summing, and updating process by using the        updated meta-symbol dictionary successively. Of the meta-symbol        frequency table after similarly covering and summing by using        the updated meta-symbol dictionary 102, the portion of        meta-symbols of which frequency is 1 or more is as shown in        FIG. 22. Since the threshold C4 is 100, and all meta-symbols in        FIG. 22 are 4 or less in the number of characters and the        maximum number of characters is less than 1000, the meta-symbol        dictionary update judging means 106 updates the meta-symbol        dictionary by adding meta-symbols such as “-NOV-”, “-NOV/”,        “-NOV0”, adding each meta-symbol in FIG. 18 (that is,        one-character meta-symbol) to the end, concerning 31 types of        meta-symbols of which frequency is 100 or more, such as “-NOV”,        “/AM0”, “/AM1”, “/KAW” in FIG. 22.

At the same time, the meta-symbol dictionary update judging means 106does not terminate the second step because addition of meta-symbol hasoccurred once or more as shown above, but judges to continue similarcovering, summing, and updating process by using the updated meta-symboldictionary successively. Of the meta-symbol frequency table aftersimilarly covering and summing by using the updated meta-symboldictionary 102, the portion of meta-symbols of which frequency is 1 ormore is as shown in FIG. 23. Comparing FIG. 23 and FIG. 22, the numberof types of meta-symbols with frequency of 1 or more is decreased by twotypes in spite of addition of meta-symbols, and it is confirmed that themeta-symbols smaller in the number of characters are “being shut out”from the extraction result by the maximal word extraction bymeta-symbols with a large number of characters. Since the threshold C5is 100, and all meta-symbols in FIG. 23 are 5 or less in the number ofcharacters and the maximum number of characters is less than 1000, themeta-symbol dictionary update judging means 106 updates the meta-symboldictionary by adding meta-symbols such as “-NOV-”, “-NOV-/”, “-NOV-0”,adding each meta-symbol in FIG. 18 (that is, one-character meta-symbol)to the end, concerning 20 types of meta-symbols of which frequency is100 or more, such as “-NOV-”, “/KAWA”, “/SUDA”, “/SUKA” in FIG. 23.

At the same time, the meta-symbol dictionary update judging means 106does not terminate the second step because addition of meta-symbol hasoccurred once or more as shown above, but judges to continue similarcovering, summing, and updating process by using the updated meta-symboldictionary successively. Of the meta-symbol frequency table aftersimilarly covering and summing by using the updated meta-symboldictionary 102, the portion of meta-symbols of which frequency is 1 ormore is as shown in FIG. 24. Comparing FIG. 24 and FIG. 22, the numberof types of meta-symbols with frequency of 1 or more is decreasedfurther, and it is confirmed that the meta-symbols smaller in the numberof characters are “being shut out” from the extraction result by themaximal word extraction by meta-symbols with a large number ofcharacters. Since the threshold C6 is 100, and all meta-symbols in FIG.24 are 6 or less in the number of characters and the maximum number ofcharacters is less than 1000, the meta-symbol dictionary update judgingmeans 106 updates the meta-symbol dictionary by adding meta-symbols,adding each meta-symbol in FIG. 18 (that is, one-character meta-symbol)to the end, concerning 16 types of meta-symbols of which frequency is100 or more in FIG. 24.

At the same time, the meta-symbol dictionary update judging means 106does not terminate the second step because addition of meta-symbol hasoccurred once or more as shown above, but judges to continue similarcovering, summing, and updating process by using the updated meta-symboldictionary successively. Of the meta-symbol frequency table aftersimilarly covering and summing by using the updated meta-symboldictionary 102, the portion of meta-symbols of which frequency is 1 ormore is as shown in FIG. 25. Since the threshold C7 is 100, and allmeta-symbols in FIG. 25 are 7 or less in the number of characters andthe maximum number of characters is less than 1000, the meta-symboldictionary update judging means 106 attempts to add meta-symbols, addingeach meta-symbol in FIG. 18 (that is, one-character meta-symbol) to theend, concerning nine types of meta-symbols of which frequency is 100 ormore in FIG. 25, but, as for five types of meta-symbols “/SUDA”,“/SUKAWA”, “/SUWA”, “98-NOV”, “WADA”, since the meta-symbol adding onecharacter at the end is already included all in the meta-symboldictionary, it updates the meta-symbol dictionary by adding to theremaining four types of meta-symbols. At the same time, the meta-symboldictionary update judging means 106 does not terminate the second stepbecause addition of meta-symbol has occurred once or more as shownabove, but judges to continue similar covering, summing, and updatingprocess by using the updated meta-symbol dictionary successively.Comparing FIG. 25 and FIG. 24, the frequency of meta-symbol “/SUKAWA” isdecreased from 187 to 81. It is confirmed, due to the presence ofmeta-symbol “/SUKAWA” in FIG. 25, that “SUKAWA” of the symbol “1998 . .. /SUKAWA” is deleted from the covering result, and that only thefrequency of “SUKAWA” of the symbol “1998 . . . YASUKAWA” is left over.Of the meta-symbol frequency table after similar covering and summingprocess by using the updated meta-symbol dictionary 102, the portion ofmeta-symbols of which frequency is 1 or more is as shown in FIG. 26.Since the threshold C8 is 100, and all meta-symbols in FIG. 26 are 8 orless in the number of characters and the maximum number of characters isless than 1000, the meta-symbol dictionary update judging means 106attempts to add meta-symbols, adding each meta-symbol in FIG. 18 (thatis, one-character metasymbol) to the end, concerning six types ofmeta-symbols of which frequency is 100 or more in FIG. 26, but, as forfive types of meta-symbols “/SUDA”, “/SUKAWA”, “/SUWA”, “98-NOV”,“WADA”, since the meta-symbol adding one character at the end is alreadyincluded all in the meta-symbol dictionary, it updates the meta-symboldictionary by adding to the remaining meta-symbol “1998-NOV”.

At the same time, the meta-symbol dictionary update judging means 106does not terminate the second step because addition of meta-symbol hasoccurred once or more as shown above, but judges to continue similarcovering, summing, and updating process by using the updated meta-symboldictionary successively. Of the meta-symbol frequency table aftersimilarly covering and summing by using the updated meta-symboldictionary 102, the portion of meta-symbols of which frequency is 1 ormore is as shown in FIG. 27. Since the threshold C9 is 100, and allmeta-symbols in FIG. 27 are 9 or less in the number of characters andthe maximum number of characters is less than 1000, the meta-symboldictionary update judging means 106 attempts to add meta-symbols, addingeach meta-symbol in FIG. 18 (that is, one-character meta-symbol) to theend, concerning five types of meta-symbols of which frequency is 100 ormore in FIG. 27, but, as for four types of meta-symbols “/SUDA”,“/SUKAWA”, “/SUWA”, “WADA”, since the meta-symbol adding one characterat the end is already included all in the meta-symbol dictionary, itupdates the meta-symbol dictionary by adding to the remainingmeta-symbol “1998-NOV-”. At the same time, the meta-symbol dictionaryupdate judging means 106 does not terminate the second step becauseaddition of meta-symbol has occurred once or more as shown above, butjudges to continue similar covering, summing, and updating process byusing the updated meta-symbol dictionary successively. Of themeta-symbol frequency table after similar covering and summing processby using the updated meta-symbol dictionary 102, the portion ofmeta-symbols of which frequency is 1 or more is as shown in FIG. 28.Since the threshold C10 is 100, and all meta-symbols in FIG. 28 are 10or less in the number of characters and the maximum number of charactersis less than 1000, the meta-symbol dictionary update judging means 106attempts to add meta-symbols, adding each meta-symbol in FIG. 18 (thatis, one-character meta-symbol) to the end, concerning four types ofmeta-symbols of which frequency is 100 or more in FIG. 28, but, as forthese four types of meta-symbols “/SUDA”, “/SUKAWA”, “/SUWA”, “WADA”,since the meta-symbol adding one character at the end is alreadyincluded all in the meta-symbol dictionary, no additional processing isdone and the meta-symbol dictionary is not updated. Thus, since additionof meta-symbol does not occur, the meta-symbol dictionary update judgingmeans 106 terminate the second step.

At a third step of compiling symbol dictionary, the meta-symboldictionary update judging means 106 refers to the meta-symbol frequencytable 105, and deletes meta-symbols of two characters or more havingfrequency of less than the threshold E (that is, 5) from the meta-symboldictionary 102. In FIG. 28, of the meta-symbols with frequency of 1 ormore, nothing is less than 5 in frequency, and in this case thefrequency is 0, and all meta-symbols with two characters or more aredeleted, and the content of the meta-symbol dictionary 102 isconsequently the sum of the meta-symbols in FIG. 28 and meta-symbols inFIG. 18. The actual structure of meta-symbol dictionary is built andheld as digital tree data structure as shown in FIG. 32 (that is, TRIE).This ends the third step of compiling symbol dictionary.

At a fourth step of compiling symbol dictionary, the symbol coveringmeans 103 finds the covering result by covering each symbol data in thesymbol data 101 by using the meta-symbol dictionary 102 in FIG. 32calculated at the third step, and the meta-symbol appearance informationcompiling means 108 compiles meta-symbol appearance information 108recording the symbol number of appearance of each meta-symbol from thecovering result and the appearance character position. In this case, themeta-symbol appearance information as shown in FIG. 29 is compiled. InFIG. 29, however, for the ease of interpretation, the symbol characterstring is used instead of the symbol number. This compilation process isso-called inversion by nature, and it can be done efficiently by thetechnique generally employed in information retrieval system. Thecollating character position is expressed by the number of characters atthe left side of the collating portion (the left character count in FIG.29) and the number of characters at the right side of the collatingportion (right character count in FIG. 29). The content as shown in FIG.29 is recorded as a summary table in each meta-symbol, and by retrievingby using the meta-symbol and the collating character position as theclues, the string (set) of numbers of symbols including the designatedmeta-symbol at the designated character position can be obtainedefficiently. This ends the fourth step of compiling symbol dictionary.

At a fifth step of compiling a symbol dictionary, the symbol dictionarycompiling means 109 compiles a machine-retrievable symbol dictionary 110from the meta-symbol dictionary 102 and meta-symbol appearanceinformation 108. At this time, the meta-symbol appearance information108 stores the table as shown in FIG. 29 directly in the symboldictionary 110, but as for the meta-symbol dictionary 102, aside fromthe information in TRIE structure as shown in FIG. 32, the meta-symbolextension table adding extended information of meta-symbol as shown inFIG. 33 is also stored in the symbol dictionary 110 as meta-symbolinformation. The meta-symbol extension table in FIG. 33 is a tablerecording three sets of meta-symbol in the meta-symbol dictionarycontaining M as character string, and number of characters of right andleft extended portions, in every meta-symbol M in the meta-symboldictionary, and for example, the extended information of meta-symbol “-”is expressed as follows:

-   -   {(-, 0, 0), (-01, 0, 2), . . . , (-29, 0, 2), (-3, 0, 1), . . .        , (R-0, 1, 1), (R-3, 1, 1)}

This extension table of meta-symbol can be compiled same as in thecompiling process of the meta-symbol appearance information 108 shownabove. This ends the fifth step of compiling symbol dictionary, and thesymbol dictionary 110 is compiled, and the symbol dictionary compilationis over.

As explained herein, according to the compiling method of symboldictionary in the first embodiment of the invention, as for the partialcharacter string appearing at high frequency in the symbol data, bycompiling a meta-symbol dictionary having meta-symbols with more numberof characters, since the covering information of symbol is recorded byusing this meta-symbol dictionary, the symbol dictionary can be compiledby a smaller quantity of information, and when retrieving the symboldictionary, the symbol including the partial character string appearingat high frequency can be retrieved at high speed as compared with theconventional symbol dictionary retrieval. Moreover, this meta-symboldictionary compilation can be executed mechanically by setting thethreshold, and an appropriate symbol dictionary suited to deviation ofcharacter string distribution of symbol data can be compiled withoutrequiring manual operation.

(Embodiment 2)

A second embodiment of the invention is described below while referringto the drawings. FIG. 2 is a block diagram showing a generalconstitution of a symbol dictionary retrieving apparatus. In FIG. 2,reference numeral 201 is a symbol dictionary storing meta-symbolinformation and meta-symbol appearance information, 202 is retrievalcondition input means for entering character string as retrievalcondition, 203 is question character string covering means for findingthe covering result by covering the question character string which isthe retrieval condition entered from the retrieval condition input means202 by the longest matchoverlapped longest match word extraction methodby using the symbol dictionary 201, 204 is the covering resultdetermined by the question character string covering means 203, 205 issymbol number assessing means for assessing the symbol number completelycoinciding with the question character string, that is, identical withthe question character string, from the covering result 204 and themeta-symbol appearance information of symbol dictionary 201, and 206 isretrieval result output means for issuing the symbol number assessed bythe symbol number assessing means 205 and others.

In thus constituted symbol dictionary retrieving apparatus, theoperation is explained below by referring to the drawings, relating tothe example of symbol dictionary presented in the first embodiment andan example of simple retrieval condition. FIG. 7 is a flowchartdescribing the procedure of process for finding the covering result inthe question character string covering means 203, FIG. 8 is a flowchartdescribing the procedure of assessing process of symbol number in thesymbol number assessing means 205, and FIG. 34 is a conceptual diagramdescribing principal intermediate data in the process of symboldictionary retrieval in the case of giving the condition of “Find thesymbol number completely coinciding with the question character string1998-NOV-01/PM1030/KAWAYASU” as the retrieval condition.

To begin with, at a first step of retrieving symbol dictionary, thequestion character string covering means 203 retrieves the meta-symbolinformation in the symbol dictionary 201, and finds the covering of thequestion character string 1998-NOV-01/PM1030/KAWAYASU by the longestmatchoverlapped longest match word extraction method, and obtains thecovering result of C of *STEP1 in FIG. 34. The longest matchoverlappedlongest match word extraction method is a covering method in which themeta-symbol of longest match is searched from the left side of thecovering object character string, while permitting partial duplicationof meta-symbols, and if the collating character interval of a certainmeta-symbol A is completely contained in the interval of sum of thecollating character intervals of one or more other meta-symbol groups B,. . . , X, such meta-symbol A is not recorded as covering element. Morespecifically, at step 702 in FIG. 7, at the end side further from theimmediate preceding extraction result, among meta-symbols havingcollating character interval without spacing, first a set H ofmeta-symbols covering up to the utmost end side is find, and themeta-symbol of which collating start position is closest to thebeginning side, that is, having the most number of characters is foundfrom H and used as covering element, and on the basis of the collatingcharacter interval of this covering element, the covering element of thenext end side is further determined, and by this series of extractionprocess from the beginning to the end, this covering method is intendedto obtain the partial set of the covering result obtained by the maximalword extraction method. In the case of this question character string,since the covering result 204 is not empty, processing at the symbolnumber assessing means 205 starts, but in the case of absence ofcovering result, the process is stopped immediately, and there is nocovering result. The subsequent process conforms to FIG. 8. First, atstep 801, an element (at most one in C) of which collating startcharacter position s is 1 is searched in the covering result. In thisexample, “1998-NOV-0,1,11” is found out. Successively, in themeta-symbol appearance information in the symbol dictionary, all formatsof (X, 0, n-e+1) (where n is the number of characters in the questioncharacter string; it is 14 in this example) in the appearance symbolinformation of M=1998-NOV-0 are searched, and the set of symbol numberof this symbol X is recorded as A. In FIG. 34, for the ease of reading,sets A and B are described by using symbol character string instead ofthe symbol number. In this example, the symbol number of symbol such as1998-NOV-01/AM0830/NODA is determined. Once A is determined, the elementselected herein (1998-NOV-0,1,11) is deleted from C. As a result, Cbecomes as shown in C at *STEP2 in FIG. 34, and the condition judging of“Is C an empty set?” at step 802 in FIG. 8 is No, and the processadvances to step 803. At step 803, in this example, the beginningelement of C (1/P, 11, 14) is selected, and as B, the symbol (the numbercorresponding to the symbol) such as B at *STEP3 in FIG. 34 including1998-JAN-01/PM065/NODA or the like is obtained. Then, finding the commonportion of A and B, it is stored in A. That is, the content of A isreduced only to the portion contained in B. In this example, the contentof A is reduced to four symbols (their numbers). In succession, judgingat step 804 in FIG. 8, since A is not empty, the process advances tostep 805, and the element (1/P, 11, 14) selected at step 803 is deletedfrom C, and the process returns to step 802. Thereafter, up to *STEP4 to*STEP6, similarly selecting the element from C successively, B isdetermined from the meta-symbol appearance information, and theintermediate result A is reduced. In this period, neither A nor C isempty, and the process is not terminated on the way.

Finally, after the process of *STEP7, C is empty at the end of step 805in FIG. 8, it is judged Yes at step 802, and the process in the symbolnumber assessing means 205 is terminated, and the element of A, that is,the number of symbol “1998-NOV-01/PM1030/KAWAYASU” is issued to theretrieval result display means 206, and the symbol retrieval process isterminated.

As explained herein, according to the retrieving method of symboldictionary in the second embodiment of the invention, as for the partialcharacter string appearing at high frequency among symbol data,meta-symbol information having meta-symbols with greater number ofcharacters is compiled, and by using this meta-symbol information, oncethe covering result is composed from the question character string, andthe retrieval is processed by using this covering result and themeta-symbol appearance information, therefore even the retrieval ofsymbol containing partial character string appearing at high frequencycan be done faster than in the conventional retrieval of symboldictionary.

(Embodiment 3)

A third embodiment of the invention is described below while referringto the drawings. FIG. 3 is a block diagram showing a generalconstitution of a symbol dictionary retrieving apparatus. In FIG. 3,reference numeral 301 is a symbol dictionary storing meta-symbolinformation and meta-symbol appearance information, 302 is retrievalcondition input means for entering character string as retrievalcondition, 303 is question character string covering means for findingthe covering result by covering the question character string which isthe retrieval condition entered from the retrieval condition input means302 by the maximal word extraction method by using the symbol dictionary301, 304 is the covering result determined by the question characterstring covering means 303, 305 is symbol number set assessing means forassessing the set of symbol numbers coinciding forward with the questioncharacter string, that is, containing the question character string inthe beginning portion, from the covering result 304 and the meta-symbolappearance information of symbol dictionary 301, 307 is right extendedmeta-symbol assessing means for retrieving the meta-symbol informationin the symbol dictionary 301, finding all of the sets of the number ofthe meta-symbol and the collating position of the right extendedmeta-symbol (that is, the meta-symbol containing R in the beginningportion) of the rightmost partial character string R of the questioncharacter string, out of the extended meta-symbols of meta-symbol Z(that is, meta-symbols containing Z) of covering elements largest in thecollating start character position among the covering result 304, andadding and storing to the covering result 304, and 306 is retrievalresult output means for issuing the symbol number assessed by the symbolnumber assessing means 305 and others. The constituent elements 301 to304 in FIG. 3 correspond to the constituent elements 201 to 204 in FIG.2 which is the block diagram of the second embodiment.

In thus constituted symbol dictionary retrieving apparatus, theoperation is explained below by referring to the drawings, relating tothe example of symbol dictionary presented in the first embodiment andan example of simple retrieval condition. FIG. 7 is a flowchartdescribing the procedure of process for finding the covering result inthe question character string covering means 303, FIGS. 9 and 10 areflowcharts describing the procedure of assessing process of symbolnumber set in the symbol number set assessing means 305, and FIGS. 35and 36 are conceptual diagrams describing principal intermediate data inthe process of symbol dictionary retrieval in the case of giving thecondition of “Find the set of symbol numbers coinciding forward with thequestion character string 1998-NOV-01/PM” as the retrieval condition.

To begin with, at a first step of retrieving symbol dictionary, thequestion character string covering means 303 retrieves the meta-symbolinformation in the symbol dictionary 301, and finds the covering of thequestion character string 1998-NOV-01/PM by the longest matchoverlappedlongest match word extraction method, and obtains the covering result ofC of *STEP1 in FIG. 35. The procedure of the covering process is same asthe procedure of the covering process in embodiment 2. In the case ofthis question character string, since the covering result 304 is notempty, processing at the right extended meta-symbol assessing means 307starts, but in the case of absence of covering result, the process isstopped immediately, and there is no covering result. Consequently, theright extended meta-symbol assessing means 307 retrieves the meta-symbolinformation in the symbol dictionary 301, and finds the extendedmeta-symbols in the meta-symbol Z of the covering element largest in thecollating start position (that is, meta-symbols of character stringcontaining Z) among the covering result 304. Of the obtained extendedmeta-symbols, only the meta-symbol X of the right extended meta-symbol(that is, the meta-symbol containing the character string R in thebeginning portion) of the j-th rightmost partial character string R ofthe question character string (that is, the partial character stringfrom the j-th character to the final character in the question characterstring) is selected, and

-   -   (X, j, |R|+J)    -   is added to the covering result 304. In this example, Z=M, and        as its extended meta-symbols, 26 types are determined, that is,    -   [/AM01], —, [/AM12], [/PM01], [/PM02], . . . , [/PM 12],        [1998-MAR], [1998-MAY]

Out of them, 12 types of meta-symbols

-   -   [/PM01], [PM02], . . . , [PM 12]    -   as the right extended meta-symbols of the rightmost partial        string of the question character string 1998-NOV-01/PM are added        to the covering result 304 by the right extended meta-symbol        assessing means 307. This mode is shown in *STEP2 in FIG. 35.        Thus, after covering up to the right extended meta-symbols, the        symbol number set assessing means 305 determines the symbol        number set. The subsequent process conforms to FIG. 9 and        FIG. 10. First, at step 901, the set D composed of elements of        which collating start character position s is 1 is determined        from the covering result. In this example,        D={(1998-NOV-0,1,11)}. The set SC of the final result is        initialized to be empty. Since D is not empty, the process        advances to step 903, and only one element (1998-NOV-0,1,11) is        selected from D, and in the meta-symbol appearance information        in the symbol dictionary 301, all formats of (X, 0, *) in the        appearance symbol information of M=1998-NOV-0 are searched, and        the set of the symbol number of the symbol X is recorded as A.        Herein, denotes an arbitrary value (don't care). In FIG. 35 and        FIG. 36, sets such as A, C, D are described by using the symbol        character string, instead of symbol number, for the ease of        reading. In this example, at *STEP4, the symbol number of the        symbol such as 1998-NOV-01/AM0830/NODA is determined. Once A is        determined, the element selected herein (1998-NOV-01,1,11) is        deleted from D. As a result, the condition judging of “Is A an        empty set?” at step 904 is No, and the process advances to step        905. At step 905, it is judged if q is greater than the number        of characters n of the question character string, and if larger,        the elements of the set A at this moment are added to the set SC        of the final result, and if not larger, the procedure        select_cover1 (A, p, q) in FIG. 10 is fetched. In this case,        n=14 and q=11, and therefore q<n, and moving to step 907, the        procedure select_cover1 (A, p, q) in FIG. 10 is fetched, and the        process is advanced. At step 908 in FIG. 10, as compared with        the procedure arguments of character positions p and q, in order        that the collating start character position s may be larger than        p and smaller than q, the set Dp composed of elements in the        covering result C is determined.

In this example, p=1 and q=11, and when the element satisfying 1<s≦11 isdetermined from C, D1={(1/P, 11, 14)} is obtained as shown in *STEP5 inFIG. 35. Since D1 is not empty, it is No at step 909, and the processadvances to step 910. From D1, the first element (1/P, 11, 14) isselected, and from the appearance meta-symbol information of meta-symbol1/P, all elements in the format of (X, 10, *) are searched, and theintermediate result A is reduced by eliminating the portion common withA, and the result is stored in A1. Herein, * denotes an arbitrary value.In this example, as shown in *STEP6 in FIG. 35, A is reduced to threeelements. Further, from D1, the element of D1 selected herein (1/P, 11,14) is deleted. Since A1 is not empty, it is No at step 911, and theprocess advances to step 912. As compared with n=14, it is u=12, and itis judged No at step 912.

At step 914, with A1, t=11, u=14 as arguments, the procedureselect_cover1 in FIG. 10 is fetched recursively, and the process iscontinued, and the intermediate result of Ap is reduced gradually asshown in FIG. 35 and FIG. 36. At *STEP18 in FIG. 36, since u=17, andn=14 or more, All is recorded as part of the final result SC, and theretrieval process is further continued in order to search other result.Thus, while generating the combination of covering elementssystematically from the covering result 304, the meta-symbol appearanceinformation in the symbol dictionary 301 is retrieved, and the set ofthe symbol numbers commonly contained in the generated sets of coveringelements is determined, and recorded in the set SC of the final result.After processing all combinations of covering elements, the process isterminated at *STEP20, and the SC at this time is the retrieval result.

As explained herein, according to the retrieving method of symboldictionary in the third embodiment of the invention, as for the partialcharacter string appearing at high frequency among symbol data,meta-symbol information having meta-symbols with greater number ofcharacters is compiled, and by using this meta-symbol information, oncethe covering result is composed from the question character string, andthe retrieval is processed by using this covering result, the coveringresult containing the elements added by the right extended meta-symbolassessing means, and the meta-symbol appearance information, andtherefore even the forward coincidence retrieval of symbol containingpartial character string appearing at high frequency can be done fasterthan in the conventional retrieval of symbol dictionary.

(Embodiment 4)

A fourth embodiment of the invention is described below while referringto the drawings. FIG. 4 is a block diagram showing a generalconstitution of a symbol dictionary retrieving apparatus. In FIG. 4,reference numeral 401 is a symbol dictionary storing meta-symbolinformation and meta-symbol appearance information, 402 is retrievalcondition input means for entering character string as retrievalcondition, 403 is question character string covering means for findingthe covering result by covering the question character string which isthe retrieval condition entered from the retrieval condition input means402 by the longest matchoverlapped longest match word extraction methodby using the symbol dictionary 401, 404 is the covering resultdetermined by the question character string covering means 403, 408 isleft extended meta-symbol assessing means for retrieving the meta-symbolinformation in the symbol dictionary 401, finding all of the sets of thenumber of the meta-symbol and the collating position of the leftextended meta-symbol (that is, the meta-symbol containing L in the endportion) of the leftmost partial character string L of the questioncharacter string, out of the extended meta-symbols of meta-symbol Z(that is, meta-symbols containing Z) of covering elements of which thecollating start character position is 1 among the covering result 404,and adding and storing to the covering result 404, 405 is symbol numberset assessing means for assessing the set of symbol numbers coincidingbackward with the question character string, that is, containing thequestion character string in the end portion, from the covering result404 and the meta-symbol appearance information of symbol dictionary 401,and 406 is retrieval result output means for issuing the symbol numberassessed by the symbol number assessing means 405 and others. Theconstituent elements 401 to 404 in FIG. 4 correspond to the constituentelements 301 to 304 in FIG. 3 which is the block diagram of the thirdembodiment. In thus constituted symbol dictionary retrieving apparatus,the operation is explained below by referring to the drawings, relatingto the example of symbol dictionary presented in the first embodimentand an example of simple retrieval condition.

FIG. 7 is a flowchart describing the procedure of process for findingthe covering result in the question character string covering means 403,FIGS. 11 and 12 are flowcharts describing the procedure of assessingprocess of symbol number set in the symbol number set assessing means405, and FIG. 37 is a conceptual diagram describing principalintermediate data in the process of symbol dictionary retrieval in thecase of giving the condition of “Find the set of symbol numberscoinciding backward with the question character string KAWA” as theretrieval condition. To begin with, at a first step of retrieving symboldictionary, the question character string covering means 403 retrievesthe meta-symbol information in the symbol dictionary 401, and finds thecovering of the question character string IKAWA by the longestmatchoverlapped longest match word extraction method, and obtains thecovering result of C of *STEP 1 in FIG. 37. The procedure of thecovering process is same as the procedure of the covering process inembodiment 3. In the case of this question character string, since thecovering result 404 is not empty, processing at the left extendedmeta-symbol assessing means 408 starts, but in the case of absence ofcovering result, the process is stopped immediately, and there is nocovering result. Consequently, the left extended meta-symbol assessingmeans 408 retrieves the meta-symbol information in the symbol dictionary401, and finds the extended meta-symbols in the meta-symbol Z of thecovering element of which collating start position is 1 (that is,meta-symbols of character string containing Z) among the covering result404. Of the obtained extended meta-symbols, only the meta-symbol X ofthe left extended meta-symbol (that is, the meta-symbol containing thecharacter string L in the end portion) of the j-th leftmost partialcharacter string L of the question character string (that is, thepartial character string from the first character to the j-th characterin the question character string) is selected, and

-   -   (X, j+1−|L|, j+1)    -   is added to the covering result 404. In this example, Z=KAWA,        and as its extended meta-symbols, nine types are determined,        that is,    -   [/SUKAWA], [0/KAWAD], [0/KAWAN], [0/KAWAY], [5/KAWAD],        [5/KAWAN], [5/KAWAY],[KAWA], [SUKAWA]

Out of them, two types of meta-symbols

-   -   [/SUKAWA], [/KAWA]    -   as the left extended meta-symbols of the leftmost partial string        of the question character string KAWA are added to the covering        result 404 by the left extended meta-symbol assessing means 408.        This mode is shown in *STEP2 in FIG. 37. Thus, after covering up        to the left extended meta-symbols, the symbol number set        assessing means 405 determines the symbol number set. The        subsequent process conforms to FIG. 11 and FIG. 12. First, at        step 1001, the set D composed of elements of which collating end        character position e is n is determined from the covering        result. In this example,    -   D={(KAWA, 1, 5), (/SUKAWA, -2, 5), (SUKAWA, -1, 5)}

The set SC of the final result is initialized to be empty. Since D isnot empty, the process advances to step 1003, and the element (KAWA, 1,5) is selected from D, and in the meta-symbol appearance information inthe symbol dictionary 401, all formats of (X, *, 0) in the appearancesymbol information of M=KAWA are searched, and the set of the symbolnumber of the symbol X is recorded as A. In FIG. 37, sets such as A, C,D are described by using the symbol character string, instead of symbolnumber, for the ease of reading. In this example, at *STEP4, the symbolnumber of the symbol such as 1998-JAN-17/PM0930/NOKAWA is determined.

Once A is determined, the element selected herein (KAWA, 1, 5) isdeleted from D. As a result, the condition judging of “Is A an emptyset?” at step 1004 is No, and the process advances to step 1005. At step1005, it is judged if the collating start position t of the selectedcovering element is 1 or less, and if 1 or less, the element of the setA at this moment is added to the set SC of the final result, and if 2 ormore, the procedure select_cover2 (A, p, q) in FIG. 12 is fetched. Inthis case, since t=1, the element of the set A at this moment is addedto the set SC of the final result, and to search other result, theretrieval process is continued again. Thus, while generating thecombination of covering elements systematically from the covering result404, the meta-symbol appearance information in the symbol dictionary 401is retrieved, and the set of the symbol numbers commonly contained inthe generated sets of covering elements is determined, and recorded inthe set SC of the final result. After processing all combinations ofcovering elements, the process is terminated at *STEP6, and the SC atthis time is the retrieval result. As explained herein, according to theretrieving method of symbol dictionary in the fourth embodiment of theinvention, as for the partial character string appearing at highfrequency among symbol data, meta-symbol information having meta-symbolswith greater number of characters is compiled, and by using thismeta-symbol information, once the covering result is composed from thequestion character string, and the retrieval is processed by using thiscovering result, the covering result containing the elements added bythe left extended meta-symbol assessing means, and the meta-symbolappearance information, and therefore even the backward coincidenceretrieval of symbol containing partial character string appearing athigh frequency can be done faster than in the conventional retrieval ofsymbol dictionary.

(Embodiment 5)

A fifth embodiment of the invention is described below while referringto the drawings. FIG. 5 is a block diagram showing a generalconstitution of a symbol dictionary retrieving apparatus. In FIG. 5,reference numeral 501 is a symbol dictionary storing meta-symbolinformation and meta-symbol appearance information, 502 is retrievalcondition input means for entering character string as retrievalcondition, 503 is question character string covering means for findingthe covering result by covering the question character string which isthe retrieval condition entered from the retrieval condition input means502 by the longest matchoverlapped longest match word extraction methodby using the symbol dictionary 501, 504 is the covering resultdetermined by the question character string covering means 503, 507 isright extended meta-symbol assessing means for retrieving themeta-symbol information in the symbol dictionary 501, finding all of thesets of the number of the meta-symbol and the collating position of theright extended meta-symbol (that is, the meta-symbol containing R in thebeginning portion) of the rightmost partial character string R of thequestion character string, out of the extended meta-symbols ofmeta-symbol Z (that is, meta-symbols containing Z) of covering elementslargest in the collating start character position among the coveringresult 504, and adding and storing to the covering result 504, 508 isleft extended meta-symbol assessing means for retrieving the meta-symbolinformation in the symbol dictionary 501, finding all of the sets of thenumber of the meta-symbol and the collating position of the leftextended meta-symbol (that is, the meta-symbol containing L in the endportion) of the leftmost partial character string L of the questioncharacter string, out of the extended meta-symbols of meta-symbol Z(that is, meta-symbols containing Z) of covering elements of which thecollating start character position is 1 among the covering result 504,and adding and storing to the covering result 504, 509 is both extendedmeta-symbol assessing means for retrieving the meta-symbol informationin the symbol dictionary 501, retrieving all of both extendedmeta-symbols of question character string (that is, the meta-symbolscontaining the question character string Q in the portion from the j-thcharacter to the j+|Q|-th character, where 1<j) X, adding and storingelements (X, 1−j, 1−j+|X|) to the covering result 504, 505 is symbolnumber set assessing means for assessing the set of symbol numberscoinciding intermediately with the question character string, that is,containing the question character string, from the covering result 504and the meta-symbol appearance information of symbol dictionary 501, and506 is retrieval result output means for issuing the symbol numberassessed by the symbol number assessing means 505 and others. Theconstituent elements 501 to 504 and 506 in FIG. 5 correspond to theconstituent elements 201 to 204 and 206 in FIG. 2 which is the blockdiagram of the second embodiment, the constituent element 507 in FIG. 5corresponds to the constituent element 307 in FIG. 3 of the blockdiagram of the third embodiment, and the constituent element 508 in FIG.5 corresponds to the constituent element 408 in FIG. 4 of the blockdiagram of the fourth embodiment. In thus constituted symbol dictionaryretrieving apparatus, the operation is explained below by referring tothe drawings, relating to the example of symbol dictionary presented inthe first embodiment and an example of simple retrieval condition.

FIG. 7 is a flowchart describing the procedure of process for findingthe covering result in the question character string covering means 503,FIGS. 13 and 14 are flowcharts describing the procedure of assessingprocess of symbol number set in the symbol number set assessing means505, and FIGS. 38 and 39 are conceptual diagrams describing principalintermediate data in the process of symbol dictionary retrieval in thecase of giving the condition of “Find the set of symbol numberscoinciding intermediately with the question character string KAWADA” asthe retrieval condition. To begin with, at a first step of retrievingsymbol dictionary, the question character string covering means 503retrieves the meta-symbol information in the symbol dictionary 501, andfinds the covering of the question character string KAWADA by thelongest matchoverlapped longest match word extraction method, andobtains the covering result of C of *STEP1 in FIG. 38. The procedure ofthe covering process is same as the procedure of the covering process inembodiment 3. In the case of this question character string, since thecovering result 504 is not empty, processing at the right extendedmeta-symbol assessing means 507 starts, but in the case of absence ofcovering result, the process is stopped immediately, and there is nocovering result. Consequently, the right extended meta-symbol assessingmeans 507 retrieves the meta-symbol information in the symbol dictionary501, and finds the extended meta-symbols in the meta-symbol Z of thecovering element largest in the collating start position (that is,meta-symbols of character string containing Z) among the covering result504. Of the obtained extended meta-symbols, only the meta-symbol X ofthe right extended meta-symbol (that is, the meta-symbol containing thecharacter string R in the beginning portion) of the j-th rightmostpartial character string R of the question character string (that is,the partial character string from the j-th character to the finalcharacter in the question character string) is selected, and

-   -   (X, j, |R|+1)    -   is added to the covering result 504. In this example, Z=WADA,        and as its extended meta-symbols, only one type is determined,        that is, WADA. It is also the right extended meta-symbol of the        rightmost partial string of the question character string        KAWADA, the right extended meta-symbol assessing means 507 adds        to the covering result 504, but since the same covering element        is already contained in the covering result 504, the covering        result 504 is not changed. This mode is shown in *STEP2 in        FIG. 38. Consequently, the left extended meta-symbol assessing        means 508 retrieves the meta-symbol information in the symbol        dictionary 501, and finds the extended meta-symbols in the        meta-symbol Z of the covering element of which collating start        position is 1 (that is, meta-symbols of character string        containing Z) among the covering result 504. Of the obtained        extended meta-symbols, only the meta-symbol X of the left        extended meta-symbol (that is, the meta-symbol containing the        character string L in the end portion) of the j-th leftmost        partial character string L of the question character string        (that is, the partial character string from the first character        to the j-th character in the question character string) is        selected, and    -   (X, j+1−|L|, j+1)    -   is added to the covering result 504. In this example, Z=KAWA,        and as its extended meta-symbols, nine types are determined,        that is,    -   [/SUKAWA], [0/KAWAD], [0/KAWAN], [0/KAWAY],    -   [5/KAWAD], [5/KAWAN], [5/KAWAY],[KAWA], [SUKAWA]

Out of them, five types of meta-symbols

-   -   [/SUKAWA], [0/KAWAD], [5/KAWAD],        -   [KAWA], [SUKAWA]    -   as the left extended meta-symbols of the leftmost partial string        of the question character string KAWADA are added to the        covering result 504 by the left extended meta-symbol assessing        means 508. This mode is shown in *STEP3 in FIG. 38. Next, the        both extended meta-symbol assessing means 509 retrieves the        meta-symbol information in the symbol dictionary 501, retrieves        all of both extended meta-symbols of question character string        KAWADA (that is, the meta-symbols containing the question        character string KAWADA in the portion of j+6 characters from        the j-th character, where 1<j) X, and adds elements (X, 1−j,        1−j+|X|) to the covering result 504. In the case of this        example, meta-symbol containing KAWADA is not present in the        meta-symbol information in the symbol dictionary 501, and        nothing is added to the covering result 504. This mode is shown        in *STEP4 in FIG. 42. Thus, after covering up to the right        extended meta-symbols, left extended meta-symbols, and both        extended symbols, the symbol number set assessing means 505        determines the symbol number set. The subsequent process        conforms to FIG. 13 and FIG. 14. First, at step 1101, the set D        composed of elements of which collating start character position        s is 1 or less is determined from the covering result. In this        example,    -   D={(KAWA, 1, 5), (/SUKAWA, -1, 5), (0/KAWAD, -1, 6),    -   (5/KAWAD, -1, 6), (SUKAWA, -1, 5)}

The set SC of the final result is initialized to be empty.

This mode is shown in *STEP5 in FIG. 38. Since D is not empty, theprocess advances to step 1103, and the element (KAWA, 1,) is selectedfrom D, and in the meta-symbol appearance information in the symboldictionary 501, each appearance symbol information (X, L, R) of M=KAWAis recorded as A by collecting the sets of elements in the format of (X,L). In FIG. 38 and FIG. 39, sets such as A, C, D are described by usingthe symbol character string, instead of symbol number, for the ease ofreading. In this example, at *STEP6, elements such as1998-JAN-17/PM0930/NOKAWA are determined. Once A is determined, theelement selected herein (KAWA, 1, 5) is deleted from D. As a result, thecondition judging of “Is A an empty set?” at step 1104 is No, and theprocess advances to step 1105. At step 1105, judging if the collatingend position q of the selected covering element is greater than n ornot, and if greater than n, the element of the set A at this moment isadded to the set SC of the final result, and if less than n, theprocedure select_cover3 (A, p, q) in FIG. 14 is fetched. In this case,since q=5, the procedure select_cover3 (A, 1, 5) is fetched. At step1108 in FIG. 14, all elements of which collating position (s, e) is inthe relation of 1<s≦5<e are selected from C, and D1 is obtained. In thisexample, D1={(WADA, 3, 7)}. Since D1 is not empty, judgement at step1109 is No, and the process advances to step 1110. At step 1110,selecting the only one element (WADA, 3, 7) from D1, (X, L-2) isrecorded in B for each appearance symbol information (X, L, R) ofM=WADA, and the selected element is deleted from D1. Further, A∩B iscalculated, but there is no common part, and A1 is empty, and thejudging result at step 1111 is Yes, and the process returns to step1109. However, since D1 is empty, the judging result at step 1109 isYes, and it is returned from select_cover3. This mode is shown in *STEP8in FIG. 38. At step 1102 in FIG. 13, since D is not empty, the processfurther advances to step 1103. At step 1103, selecting the element(/SUKAWA, 2, 5) from D, A is calculated as in *STEP9, and advancing tostep 1107, the procedure select_cover3 (A, 1, 5) in FIG. 14 is fetchedagain. At step 1108 in FIG. 14, all elements of which collating position(s, e) is in the relation of 1<s≦5<e are selected from C, and D1 isobtained. In this example, D1={(WADA, 3, 7)}. This mode is shown in*STEP10 in FIG. 38. Since D1 is not empty, judgement at step 1109 is No,and the process advances to step 1110. At step 1110, selecting the onlyone element (WADA, 3, 7) from D1, (X, L-2) is recorded in B for eachappearance symbol information (X, L, R) of M=WADA, and the selectedelement is deleted from D1. Further, A∩B is calculated, but there is nocommon part, and A1 is empty, and the judging result at step 1111 isYes, and the process returns to step 1109. However, since D1 is empty,the judging result at step 1109 is Yes, and it is returned fromselect_cover3. This mode is shown in *STEP11 in FIG. 38. At step 1102 inFIG. 13, since D is not empty, the process further advances to step 1103again. At step 1103, selecting the element (0/KAWAD, -1, 6) from D, A iscalculated as in *STEP 12, and advancing to step 1107, the procedureselect_cover3 (A, , 6) in FIG. 14 is fetched once more. At step 1108 inFIG. 14, all elements of which collating position (s, e) is in therelation of 1<s≦6<e are selected from C, and D1 is obtained. In thisexample, D1={(WADA, 3, 7)}. This mode is shown in *STEP13 in FIG. 39.Since D1 is not empty, judgement at step 1109 is No, and the processadvances to step 1110.

At step 1110, selecting the only one element (WADA, 3,) from D1, (X,L-2) is recorded in B for each appearance symbol information (X, L, R)of M=WADA, and the selected element is deleted from D1. Further, A∩B iscalculated. This common part A1 is not empty, and u=7 is larger thann=6, and the process advances to step 1113, and A1 is added to SC aspart of the final result. Back to step 1109, since D1 is empty, it isreturned from select_cover3. This mode is shown in *STEP14 in FIG. 39.At step 1102 in FIG. 13, since D is not empty, the process furtheradvances to step 1103 again. At step 1103, selecting the element(5/KAWAD, -1, 6) from D, A is calculated as in *STEP15, and advancing tostep 1107, the procedure select_cover3 (A, 1, 6) in FIG. 14 is fetchedagain. At step 1108 in FIG. 14, all elements of which collating position(s, e) is in the relation of 1<s≦6<e are selected from C, and D1 isobtained. In this example, D1={(WADA, 3, 7)}. This mode is shown in*STEP16 in FIG. 39. Since D1 is not empty, judgement at step 1109 is No,and the process advances to step 1110. At step 1110, selecting the onlyone element (WADA, 3, 7) from D1, (X, L-2) is recorded in B for eachappearance symbol information (X, L, R) of M=WADA, and the selectedelement is deleted from D1. Further, A∩B is calculated. This common partA1 is not empty, and u=7 is larger than n=6, and the process advances tostep 1113, and A1 is added to SC as part of the final result. Back tostep 1109, since D1 is empty, it is returned from select_cover3. Thismode is shown in *STEP17 in FIG. 39. At step 1102 in FIG. 13, since D isnot empty, the process further advances to step 1103 again. At step1103, selecting the element (SUKAWA, -1, 5) from D, A is calculated asin *STEP18, and advancing to step 1107, the procedure select_cover3 (A,1, 5) in FIG. 14 is fetched once more. At step 1108 in FIG. 14, allelements of which collating position (s, e) is in the relation of1<s≦5<e are selected from C, and D1 is obtained. In this example,D1={(WADA, 3, 7)}. This mode is shown in *STEP19 in FIG. 39. Since D1 isnot empty, judgement at step 1109 is No, and the process advances tostep 1110. At step 1110, selecting the only one element (WADA, 3, 7)from D1, (X, L-2) is recorded in B for each appearance symbolinformation (X, L, R) of M=WADA, and the selected element is deletedfrom D1. Further, A∩B is calculated, but there is no common part, and A1is empty, and the judging result at step 1111 is Yes, and the processreturns to step 1109. However, since D1 is empty, the judging result atstep 1109 is Yes, and it is returned from select_cover3. This mode isshown in *STEP20 in FIG. 38. At step 1102 in FIG. 13, since D is empty,the assessing process of symbol number set is terminated. At thismoment, since SC is holding all of the sets of the combinations of theinfix matching symbols (their numbers) and the number of characters ofthe left side of the collating portion (the beginning side of symbol),by picking up only a first element of each set, the intermediatecoincidence retrieval result is obtained. Thus, by retrieving themeta-symbol appearance information in the symbol dictionary 501 whilegenerating the combinations of the covering elements systematically fromthe covering result 504, the set of the symbol numbers commonlycontained in the set of the generated covering element can bedetermined.

As explained herein, according to the retrieving method of symboldictionary in the fifth embodiment of the invention, as for the partialcharacter string appearing at high frequency among symbol data,meta-symbol information having meta-symbols with greater number ofcharacters is compiled, and by using this meta-symbol information, oncethe covering result is composed from the question character string, andthe retrieval is processed by using this covering result, and thecovering result containing the elements added by three means, that is,the right extended meta-symbol assessing means, the left extendedmeta-symbol assessing means, and the both extended meta-symbol assessingmeans, and therefore even the intermediate coincidence retrieval ofsymbol containing partial character string appearing at high frequencycan be done faster than in the conventional retrieval of symboldictionary.

As explained in the five embodiments of the invention herein, accordingto the symbol dictionary compiling method and symbol dictionaryretrieving method of the invention,

-   -   (1) by compiling automatically a meta-symbol dictionary        collecting shorter symbols called “meta-symbols” for covering        symbols in symbol data, covering each symbol in the symbol data        by the meta-symbols in this meta-symbol dictionary, and        compiling meta-symbol appearance information recording the        information showing how each symbol is covered in every        meta-symbol, and    -   (2) retrieving the question character string by using the        meta-symbol dictionary contained in the symbol dictionary,        covering with the meta-symbols, adding the retrieval results of        left, right and both extended meta-symbols to the covering        result, and determining the symbol number set contained commonly        in every element set in the covering result covering the        question character string or its left, right and both extended        character strings, the following problems in the conventional        symbol dictionary compiling method and symbol dictionary        retrieving method can be solved, that is:    -   1) The symbol dictionary file to be created is more than twice        as much as the symbol data to be retrieved, and it is hard to        realize if the usable capacity of the memory device is limited.    -   2) If the character string is long, and when retrieving symbols        containing characters or character chain of high frequency of        appearance, the quantity of data to be retrieved from the symbol        dictionary is large, and the retrieval speed is lowered.    -   3) In the method of using character chain, if the number of        character chains N is increased, the types of N character chains        to appear increase suddenly, and it is hard to compile symbol        dictionary, and the capacity of the compiled symbol dictionary        is increased.

Thus, although difficult in the conventional symbol dictionary compilingand retrieving technology, high speed retrieval is possible including upto infix matching, and a symbol dictionary of small capacity can becompiled, and even in the application where the complete matchingoccupies the majority of questions, the symbol retrieval is possiblewithout lowering the average retrieval speed, so that tremendous effectsare obtained practically.

In the foregoing five embodiments, as character sets, alpha-numerics andspecial symbols are used, but the same effects are obtained in thecharacter sets adding Chinese characters and Greek alphabet, too. In thefirst embodiment, prior to compilation of symbol dictionary, ameta-symbol dictionary composed of one character only as shown in FIG.16 is prepared, but in addition to the content in FIG. 16, a meta-symboldictionary containing meta-symbols of two or more characters of whichappearance can be predicted, such as “1998-” and “AM” can be prepared,and in this case, too, the symbol dictionary can be compiled in the sameprocedure as explained above. As the storing data structure ofmeta-symbol information, TRIE structure and table structure are shown,but if using other data structure, such as finite state machine,PATRICIA tree, or hash table, it is possible to execute in the sameprocedure as explained above. The storing format of the meta-symbolappearance information is not limited to the table, but by using TRIE,hash table or other data structure, it is possible to execute in thesame procedure as explained above.

In symbol dictionary retrieval, for the convenience of explanation, thecovering result is expressed by using the set, but by using linked list,heap, tree structure, hash table or other data structure, it is possibleto execute in the same procedure as explained above.

Thus, in the symbol dictionary compiling method of the invention, forcompiling a machine-retrievable symbol dictionary of symbol data bycomplete matching, prefix matching, postfix matching or infix matching,a meta-symbol dictionary collecting shorter symbols called“meta-symbols” for covering the symbol in the symbol data is compiledautomatically, and each symbol in the symbol data is covered with themeta-symbol in this meta-symbol dictionary, and the information showinghow each symbol is covered is obtained by preparing meta-symbol onsetinformation recorded in each meta-symbol, and therefore high speedretrieval including up to infix matching is achieved, and the size ofcompiled symbol dictionary can be reduced, thereby bringing aboutoutstanding effects.

Also, in the symbol dictionary retrieving method of the invention, formachine-retrieving of symbol dictionary by complete matching, prefixmatching, postfix matching or infix matching, a question characterstring is covered with a meta-symbol by retrieving the meta-symboldictionary contained in the symbol dictionary, retrieval results ofleft, right and both extended meta-symbols are added to this coveringresult, and high speed retrieval is possible up to infix matching byusing a symbol dictionary of small capacity, by seeking the symbolnumber set commonly contained in every element set in the questioncharacter string or covering results covering the left, right and bothextended character strings, and moreover in the application wherecomplete matching occupies the majority of questions, symbol retrievalis possible without lowering the average retrieval speed, therebybringing about outstanding effects.

The effects of the invention appear very clearly when compiling andretrieving a symbol dictionary from symbol data of large scale having adeviated distribution in which the symbol data to be retrieved containsymbols of more than tens of thousands of kinds, each character has agreat number of characters, and there are partial character stringscommonly included in many symbols. For example, in an experiment ofcompiling a symbol dictionary from symbol data containing 1 millionsymbols in which each symbol is a 100-digit numeral, all symbols areequal in the upper 90 digits and all symbols are different in the lower10 digits, in the conventional symbol dictionary of n-gram system, atleast 100 million symbol numbers, appearance character positioninformation, and information for character linking are needed, and thesize is more than 400 megabytes, but in the symbol dictionary compiledby the symbol dictionary compiling method of the invention, it requiresto record only about 50000 kinds of meta-symbol information, and about 4million pieces of meta-symbol appearance information, and the requiredsize is smaller than 40 megabytes, and the capacity is less than{fraction (1/10)} of the conventional system. Moreover, in the case ofretrieval by complete matching, in the symbol dictionary of theconventional n-gram system, unless the number of links n of characterlinking is 41 or more, the intermediate result of the higher 40 digitsis always 1 million symbols long, and the retrieval speed issubstantially lowered, but in the retrieval of symbol dictionary of theinvention, a meta-symbol dictionary containing meta-symbols of 40+αdigits suited to deviation of distribution of symbol data is createdautomatically, and the symbol number is searched only by referring tothe appearance information of the meta-symbol relating to the questioncharacter string, so that retrieval of an extremely high speed isrealized. Thus, symbol data having deviation which was conventionallyhard to handle can be retrieved at high speed including up tointermediate coincidence, and outstanding effects are obtainedpractically.

1. A method of retrieving a complete match to an arbitrary characterstring query using a symbol dictionary containing a meta-symbolinformation and a meta-symbol appearance information, comprising thesteps of: (a) retrieving meta-symbol information in said symboldictionary, (b) searching for a covering the query character string Q byduplicate longest match word extraction method that is, coveringelements of pair (m, s, e) of meta-symbol m collating with a partialcharacter string, collating a character start position s, and collatingan end character position e (1≦s<e≦|Q|+1) in the character string to becovered, and a set containing any character of Q in at least onecovering element, (c) storing the results of the search of step (b) inthe working area/storage/memory, (d) terminating the retrieval if thereis no covering such that, if there is no set of covering elements ofpair (m, s, e) of meta-symbol m collating with said partial characterstring, collating said character start position s, and collating saidend character position e (1≦s<e≦|Q|+1) in the character string to becovered, and containing at least one covering element for each characterof Q, and (e) retrieving the meta-symbol appearance information in saidsymbol dictionary and if there is only one symbol number containedcommonly in all elements in said covering result, it is issued as theretrieval result and the retrieval process is terminated, and if thereis no symbol number contained commonly in all elements in said coveringresult, the retrieval process is terminated as being no retrievalresult.
 2. A method of retrieving by forward coincidence a response toan arbitrary character string query using a symbol dictionary storingmeta-symbol information and meta-symbol appearance information, that is,retrieving all symbols having said queried character string in thebeginning portion, comprising the steps of: a first step of symboldictionary retrieval in which (a) a question character string coveringmeans retrieves meta-symbol information in said symbol dictionary andsearches covering in the query character string Q by longest matchoverlapped longest match word extraction method that is, coveringelements of pair (m, s, e) of meta-symbol m collating with a partialcharacter string, collating a character start position s, and collatingan end character position e (1≦s<e≦|Q|+1) in the character string to becovered, and a set containing any character of Q in at least onecovering element, (b) the retrieval is terminated as being no retrievalresult if there is no covering elements of pair (m, s, e) of meta-symbolm collating with said partial character string, said collating characterstart position s, and collating said end character position e(1≦s<e≦|Q|+1) in the character string to be covered, and there is no setcontaining any character of Q in at least one covering element, and (c)if there is covering, the covering result is recorded, a second step ofsymbol dictionary retrieval in which (d) a right extended meta-symbolassessing means retrieves meta-symbol information in said symboldictionary, retrieves, in said covering result, all meta-symbols x ofright extended meta-symbols, that is, meta-symbols containing characterstring R in the beginning portion of j-th rightmost portion characterstring of said query character string R such that a partial characterstring from the j-th character (1≦j≦|Q|) to a final character in querycharacter string, out of extended meta-symbols of meta-symbol Z ofcovering elements of which collating said start character position is 1,that is, meta-symbols containing Z, and adds elements (x, j, |R|+j) tosaid covering result and records, and a third step of symbol dictionaryretrieval in which a symbol number set assessing means retrieves saidmeta-symbol appearance information in said symbol dictionary whilesystematically compiling a set C of elements in said covering result,covering said query character string or an arbitrary right extendedcharacter string, collects a symbol number set SC commonly contained inall elements of C, records as part of said retrieval result, and issuesthe sum set of all SCs as a final retrieval result.
 3. A method ofretrieving, by backward coincidence, a response to an arbitrarycharacter string query using a symbol dictionary storing meta-symbolinformation and meta-symbol appearance information, that is, retrievingall symbols having said queried character string in the end portion,comprising the steps of: a first step of symbol dictionary retrieval inwhich (a) a queried character string covering means retrievesmeta-symbol information in said symbol dictionary and searches coveringin the queried character string Q by longest match overlapped longestmatch word extraction method that is, covering elements of pair (m, s,e) of a meta-symbol m collating with a partial character string,collating a character start position s, and collating an end characterposition e (1≦s<e≦|Q|+1) in the character string to be covered, and aset containing any character of Q in at least one covering element, (b)the retrieval process is terminated as being no retrieval result ifthere is no covering, that is, covering elements of pair (m, s, e) ofmeta-symbol m collating with partial character string, collatingcharacter start position s, and collating end character position e(1≦s<e≦|Q|+1) in the character string to be covered, and there is no setcontaining any character of Q in at least one covering element, theretrieval process is terminated as being no retrieval result, and (c) ifthere is covering, the covering result is recorded, a second step ofsymbol dictionary retrieval in which (d) a left extended meta-symbolassessing means retrieves meta-symbol information in said symboldictionary, retrieves, in said covering result, all meta-symbols x of aleft extended meta-symbols that is, meta-symbols containing a characterstring L in the end portion) of j-th leftmost portion character stringof said queried character string that is, the partial character stringfrom the first character to the j-th character (1≦j≦|Q|) in questioncharacter string L, out of an extended meta-symbols of meta-symbol Z ofcovering elements of which collating end character position is |Q|+1 [(]that is, meta-symbols containing Z, and adds elements (x, j+1−|L|, j+1)to said covering result and records, and a third step of symboldictionary retrieval in which a symbol number set assessing meansretrieves said meta-symbol appearance information in said symboldictionary while systematically compiling a set C of elements in saidcovering result covering said queried character string or an arbitraryleft extended character string, collects a symbol number set SC commonlycontained in all elements of C, records as part of retrieval result, andissues the sum set of all SCs as a final retrieval result.
 4. A methodof retrieving, by intermediate coincidence, to an arbitrary characterstring query using a symbol dictionary storing meta-symbol informationand meta-symbol appearance information, that is, retrieving all symbolshaving said queried character string, comprising: a first step of symboldictionary retrieval in which a question character string covering meansretrieves said meta-symbol information in said symbol dictionary andsearches covering in the queried character string Q by longestmatchoverlapped longest match word extraction method that is, coveringelements of pair (m, s, e) of a meta-symbol m collating with a partialcharacter string, collating a character start position s, and ancollating end character position e (1≦s<e≦|Q|+1) in the character stringto be covered, and a set containing any character of Q in at least onecovering element, the retrieval process is terminated with no retrievalresult if there is no covering that is, covering elements of pair (m, s,e) of said meta-symbol m collating with said partial character string,collating said character start position s, and collating said endcharacter position e (1≦s<e≦|Q|+1) in the character string to becovered, and there is no set containing any character of Q in at leastone covering element, and if there is covering, the covering result isrecorded, a second step of symbol dictionary retrieval in which a rightextended meta-symbol assessing means retrieves meta-symbol informationin said symbol dictionary, retrieves, in said covering result, allmeta-symbols x of a right extended meta-symbols, that is, meta-symbolscontaining a character string R in the beginning portion, of j-thrightmost portion character string of said queried character string R,that is, the partial character string from the j-th character (1≦j≦|Q|)to a final character in queried character string R, out of extendedmeta-symbols of meta-symbol Z of covering elements of which collatingsaid start character position is 1, that is, meta-symbols containing Z,and adds elements (x, j, |R|+j) to said covering result and records, athird step of symbol dictionary retrieval in which a left extendedmeta-symbol assessing means retrieves meta-symbol information in saidsymbol dictionary, retrieves, in said covering result, all meta-symbolsx of a left extended meta-symbols, that is, meta-symbols containing acharacter string L in the end portion, of j-th leftmost portioncharacter string of said queried character string, that is, the saidpartial character string from a first character to the j-th character(1≦j≦|Q|) in question character string L, out of extended meta-symbolsof meta-symbol Z of covering elements of which collating said endcharacter position is |Q|+1, that is, meta-symbols containing Z, andadds elements (x, j+1−|L|, j+1) to said covering result and records, afourth step of symbol dictionary retrieval in which an extendedmeta-symbol assessing means retrieves the meta-symbol information,retrieves both the extended meta-symbols of Q and X that is,meta-symbols containing character string Q in the portion from the j-thcharacter to the j+|Q|-th character, where 1<j, and X, adds elements (X,1−j, 1−j+|X|) to said covering result and records, and a fifth step ofsymbol dictionary retrieval in which a symbol number set assessing meansretrieves said meta-symbol appearance information in said symboldictionary while systematically compiling a set C of elements in saidcovering result covering said queried character string or an arbitraryextended character string, collects a symbol number set SC commonlycontained in all elements of C, records as part of said retrievalresult, and issues the sum set of all SCs as a final retrieval result.